ExploitDB Evolution Analysis: April 2024 vs May 2025¶

Research Analysis Overview

Research Question: How has the ExploitDB repository evolved between April 2024 and May 2025, and what changes occurred in CVE-exploit distribution patterns?

This comprehensive analysis examines the changes in exploit availability, CVE coverage, and distribution patterns in the ExploitDB database over a 13-month period using corrected methodology.

Executive Summary¶

Key Findings - Updated Results

Net Exploit Growth: +291 total exploits (+0.6%)
CVE Coverage Expansion: +325 unique CVEs with exploits (+1.5%)
Data Quality Focus: 2,078 CVE assignment improvements
Distribution Stability: Maintained consistent exploit distribution patterns
Pure Growth: No file removals, only additions and improvements

Methodology & Data Sources¶

graph TB
    subgraph "Corrected Analysis Framework"
        A[ExploitDB Old<br/>April 2024<br/>46,494 exploits] --> C[Comparative Analysis]
        B[ExploitDB Current<br/>May 2025<br/>46,785 exploits] --> C
        C --> D[Distribution Changes]
        C --> E[CVE-Level Changes]
        C --> F[File-Level Changes]
    end

    subgraph "Analysis Components"
        D --> G[Category Distribution]
        E --> H[Gained/Lost CVEs]
        F --> I[Added Files Only<br/>No Removals]
    end

    style C fill:#e1f5fe
    style G fill:#f3e5f5
    style H fill:#e8f5e8
    style I fill:#e8f5e8

Key Analysis Components¶

Distribution ComparisonCVE-Level ChangesFile-Level ChangesData Quality Focus

Side-by-side comparison of CVE counts in each exploit category focusing on exploits WITH CVE assignments

Identifies specific CVEs that gained or lost exploits through file additions and CVE assignment improvements

Shows actual file additions (185 with CVE, 106 without CVE) and 2,078 CVE assignment enhancements

Emphasis on CVE assignment improvements rather than content removal

Distribution Comparison Results¶

Overall CVE-Exploit Distribution Changes¶

Distribution Analysis (Exploits WITH CVE Only)

The following table shows how CVE exploit categories have evolved, focusing on exploits with CVE assignments:

Category	Old CVEs	Current CVEs	CVE Diff	Old Exploits	Current Exploits	% Change
1 Exploit	19,392	19,806	+414	19,392	19,806	+2.1%
2 Exploits	1,837	1,777	-60	3,674	3,554	-3.3%
3 Exploits	498	476	-22	1,494	1,428	-4.4%
4-5 Exploits	271	267	-4	1,154	1,138	-1.5%
6-10 Exploits	106	103	-3	759	736	-2.8%
11-15 Exploits	18	18	0	218	216	0.0%
16+ Exploits	8	8	0	216	215	0.0%

Key Distribution Insights¶

Distribution Pattern Analysis

Growth Concentration:

Single Exploit CVEs Dominate: +414 CVEs with single exploits represent 98% of the total increase
Stable Multi-Exploit Categories: Categories with 11+ exploits remained completely stable

Distribution Refinement:

Quality over Quantity: Slight decreases in 2-10 exploit categories suggest CVE assignment refinements
Consistent High-End: Very stable pattern for CVEs with many exploits (16+ category)

CVE-Level Change Analysis¶

CVEs That Gained Exploits¶

Top CVEs with Exploit Additions

CVE ID	New Count	Change	Type
CVE-2014-7910, CVE-2014-7227, CVE-2014-7196, CVE-2014-7169, CVE-2014-62771, CVE-2014-6271, CVE-2014-3671, CVE-2014-3659	12	+12	Multiple CVE Assignment
CVE-2014-6352, CVE-2014-4114	6	+6	Multiple CVE Assignment
CVE-2011-4275, CVE-2009-4140	6	+6	Multiple CVE Assignment
CVE-2007-1765, CVE-2007-0038	6	+6	Multiple CVE Assignment
CVE-2004-2513, CVE-2004-1211	5	+5	Multiple CVE Assignment
CVE-2007-0038, CVE-2007-1765	5	+5	Multiple CVE Assignment
CVE-2018-8736, CVE-2018-8735, CVE-2018-8734, CVE-2018-8733	5	+5	Multiple CVE Assignment
CVE-2009-1330, CVE-2009-1329, CVE-2009-1328, CVE-2009-1327, CVE-2009-1326, CVE-2009-1325, CVE-2009-1324	5	+5	Multiple CVE Assignment
CVE-2014-0346, CVE-2014-0160	4	+4	Multiple CVE Assignment
CVE-2008-0624, CVE-2008-0623	4	+4	Multiple CVE Assignment

CVEs That Lost Exploits¶

CVEs with Exploit Count Reductions

CVE ID	Old Count	Change	Likely Cause
CVE-2006-6525	1	-1	CVE Assignment Correction
CVE-2010-4145	1	-1	CVE Assignment Correction
CVE-2008-6259	1	-1	CVE Assignment Correction
CVE-2008-0480	1	-1	CVE Assignment Correction
CVE-2007-1570	1	-1	CVE Assignment Correction
CVE-2015-5453	1	-1	CVE Assignment Correction
CVE-2015-8038	1	-1	CVE Assignment Correction
CVE-2017-12786	1	-1	CVE Assignment Correction
CVE-2013-4863	1	-1	CVE Assignment Correction
CVE-2023-5702	1	-1	CVE Assignment Correction

Notable Patterns in CVE Changes¶

CVE Change Analysis Insights

Multiple CVE Assignments: Most exploit gains involved multiple CVE IDs assigned to existing exploits, indicating improved CVE mapping accuracy.

Historical Vulnerability Research: Gained exploits primarily target vulnerabilities from 2004-2018, showing continued research into legacy systems.

CVE Assignment Corrections: All CVE losses involved single exploits, representing precision improvements rather than content removal.

File-Level Change Analysis¶

Pure Growth Pattern¶

pie title File Changes (Total: 291)
    "Added (with CVE)" : 185
    "Added (without CVE)" : 106
    "Removed" : 0

File Change Metrics (No Removals Detected)

Metric	Count	Percentage
Files Added (with CVE)	185	63.6%
Files Added (without CVE)	106	36.4%
Files Removed	0	0.0%
CVE Assignment Changes	2,078	Major Quality Focus

Examples of New Exploit Files¶

Examples of Added Files (WITH CVE)

Recent Hardware Exploits:

exploits/hardware/local/52242.txt (CVE: CVE-2021-33216) by ub3rsick
exploits/hardware/local/52244.txt (CVE: CVE-2023-26602) by ub3rsick
exploits/hardware/remote/52119.NA (CVE: CVE-2024-9054) by Armando Huesca Prida
exploits/hardware/remote/52120.NA (CVE: CVE-2024-43687) by Armando Huesca Prida
exploits/hardware/remote/52122.NA (CVE: CVE-2024-7801) by Armando Huesca Prida

CVE Assignment Enhancement Examples¶

Examples of CVE Assignment Improvements

Enhanced CVE Mappings (Same Files, Better CVE Coverage):

File	Old CVE	New CVE
`exploits/aix/local/1046.c`	CVE-2005-2236	CVE-2005-2236, CVE-2005-2232
`exploits/aix/local/33725.txt`	CVE-2014-3977	CVE-2014-3977, CVE-2012-2179
`exploits/aix/webapps/21319.txt`	CVE-2012-2996	CVE-2012-2996, CVE-2012-2995
`exploits/android/dos/44326.py`	CVE-2017-13262	CVE-2017-13262, CVE-2017-13261, CVE-2017-13260, CVE-2017-13258
`exploits/android/dos/44327.py`	CVE-2017-13262	CVE-2017-13262, CVE-2017-13261, CVE-2017-13260, CVE-2017-13258

Database Evolution Statistics¶

Comprehensive Database Metrics¶

Database Growth & Quality Summary

Metric	Old Database	Current Database	Net Change	Change %
Total Exploits	46,494	46,785	+291	+0.6%
Exploits with CVE	26,907	27,093	+186	+0.7%
Exploits without CVE	19,587	19,692	+105	+0.5%
Unique CVEs	22,130	22,455	+325	+1.5%

CVE Assignment Quality¶

Consistency in Data Quality

Metric	Old Database	Current Database	Change
CVE Assignment Rate	57.9%	57.9%	+0.0%
Average Exploits per CVE	1.22	1.21	-0.01

Key Insights and Implications¶

Data Quality and Growth Focus¶

Quality-Driven Evolution

No Content Removal: Zero file deletions indicate a growth-focused approach rather than cleanup operations.

CVE Assignment Excellence: 2,078 CVE assignment improvements vastly exceed new file additions (291), showing major quality enhancement focus.

Stable Quality Metrics: CVE assignment rate maintained at exactly 57.9%, indicating consistent data quality standards.

Research Value Enhancement¶

Enhanced Research Capabilities

Improved CVE Coverage: 325 additional unique CVEs now have exploit evidence, expanding research scope.

Better CVE Mapping: Multiple CVE assignments per exploit improve cross-reference capabilities for vulnerability research.

Hardware Focus: Notable addition of hardware-specific exploits addressing emerging attack surfaces.

Security Intelligence Implications¶

Threat Intelligence Value

Enhanced Precision: CVE assignment improvements provide more accurate threat intelligence mapping.

Comprehensive Coverage: Multiple CVE assignments per exploit ensure complete vulnerability landscape coverage.

Modern Threats: Recent CVE additions (2021-2024) show active tracking of current threat landscape.

Recommendations¶

For Researchers¶

Research Best Practices

Leverage Enhanced Mappings: Use the improved multiple CVE assignments for comprehensive vulnerability analysis
Track Quality Improvements: Consider the 2,078 assignment changes as indicators of improved data reliability
Focus on Growth Areas: Hardware exploits show emerging research trends worth investigation

For Security Practitioners¶

Practical Implementation

Enhanced Threat Intelligence: Utilize improved CVE mappings for more precise threat assessment
Hardware Security: Pay attention to the new hardware exploit additions in security planning
Data Reliability: Trust in consistent data quality metrics for operational decision-making

Visual Analysis¶

Comprehensive Dashboard¶

The analysis generated a 6-panel visualization showing:

CVE Distribution ComparisonNet Change AnalysisFile-Level Changes BreakdownTop Gainers/LosersDatabase StatisticsQuality Metrics

Side-by-side comparison focusing on exploits WITH CVE assignments

Positive growth patterns across most categories

Pie chart showing pure growth (185 with CVE, 106 without CVE, 0 removed)

CVEs that benefited from multiple CVE assignments vs. precision corrections

Overall growth trends across all metrics

Consistent data quality maintenance

png

Analysis Conclusion

The ExploitDB evolution from April 2024 to May 2025 demonstrates a quality-focused growth strategy. With zero file removals, 291 new additions, and 2,078 CVE assignment improvements, the database shows commitment to both content expansion and data accuracy. The stable 57.9% CVE assignment rate indicates mature data quality processes, while the 325 additional unique CVEs expand research capabilities significantly.

Full code and Queries¶

Research Analysis Template¶

Key Analysis Components:¶

Distribution Comparison: Side-by-side comparison of CVE counts in each exploit category (1 exploit, 2 exploits, etc.) between old and current versions
CVE-Level Changes: Identifies specific CVEs that gained or lost exploits, showing you exactly which vulnerabilities were affected
Exploit-Level Changes: Shows specific examples of exploits that were added or removed, helping you understand what caused the distribution changes
Visualization:

Creates charts showing:¶

Bar chart comparing old vs current distributions
Net change visualization (positive/negative changes)
Top CVEs that gained exploits
Top CVEs that lost exploits

# Try to use Modin for faster pandas operations
import os

# Set the environment variable before importing Modin
os.environ["MODIN_MEMORY"] = str(4 * 1024 * 1024 * 1024)

try:
    import modin.pandas as pd
    USE_MODIN = True
    print("Using Modin for accelerated pandas operations")
except ImportError:
    import pandas as pd
    USE_MODIN = False
    print("Using standard pandas (Modin not available)")

import duckdb
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import os
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
from matplotlib.patches import Patch
import matplotlib.patches as mpatches
from scipy import stats


# Set up high-quality plotting parameters
plt.rcParams['figure.dpi'] = 300
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['savefig.format'] = 'eps'
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10

# Global analysis period settings
ANALYSIS_END_DATE = "2024-12-31"
ANALYSIS_START_DATE = "1999-01-01"  # Set to None for all data
USE_ALL_DATA = True  # Toggle this to switch between full dataset and filtered

# Create output directory for figures
os.makedirs('figures', exist_ok=True)
#os.makedirs('parquet_data', exist_ok=True)
print(f"Analysis Period: {'All available data' if USE_ALL_DATA else f'{ANALYSIS_START_DATE} to {ANALYSIS_END_DATE}'}")

Using Modin for accelerated pandas operations
Analysis Period: All available data

Connect to Database¶

# Connect to DuckDB
#conn = duckdb.connect('../cve_consolidated.duckdb')
#print('📊 Connected to vulnerability database')

def list_duckdb_tables_and_schemas(database_path=":memory:"):
    """
    Connects to a DuckDB database, lists all tables, and prints their schemas.

    Args:
        database_path (str): The path to the DuckDB database file.
                             Use ':memory:' for an in-memory database.
    """
    try:
        # Connect to the DuckDB database
        # If the database_path does not exist, DuckDB will create it.
        # For an in-memory database, use ':memory:'.
        con = duckdb.connect(database=database_path, read_only=False)
        print(f"Successfully connected to DuckDB database: {database_path}\n")

        # Get a list of all tables in the database
        # We query the 'duckdb_tables' system catalog for table names
        tables_query = "SELECT table_name FROM duckdb_tables ORDER BY table_name;"
        tables = con.execute(tables_query).fetchall()

        if not tables:
            print("No tables found in the database.")
            return

        print("--- Tables and Their Schemas ---")
        print("--------------------------------\n")

        for table_tuple in tables:
            table_name = table_tuple[0]
            print(f"Table: {table_name}")
            print("Schema:")

            # Get the schema for each table using PRAGMA table_info
            # This provides column name, type, nullability, primary key info, etc.
            schema_query = f"PRAGMA table_info('{table_name}');"
            schema_info = con.execute(schema_query).fetchall()

            if not schema_info:
                print("  (No schema information available)")
            else:
                # Print header for schema columns
                print(f"  {'Column Name':<25} {'Data Type':<15} {'Nullable':<10} {'Default':<15} {'PK':<5}")
                print(f"  {'-'*25:<25} {'-'*15:<15} {'-'*10:<10} {'-'*15:<15} {'-'*5:<5}")

                # Print each column's information
                for col in schema_info:
                    # col structure: (cid, name, type, notnull, dflt_value, pk)
                    col_name = col[1]
                    col_type = col[2]
                    col_nullable = "YES" if col[3] == 0 else "NO" # 0 means NOT NULL, 1 means NULLABLE
                    col_default = col[4] if col[4] is not None else "NULL"
                    col_pk = "PK" if col[5] == 1 else "" # 1 means Primary Key

                    print(f"  {col_name:<25} {col_type:<15} {col_nullable:<10} {col_default:<15} {col_pk:<5}")
            print("\n") # Add a newline for separation between tables

    except duckdb.Error as e:
        print(f"An error occurred: {e}")



print("--- Listing tables and schemas for an in-memory database ---")

database_path = "../cve_consolidated.duckdb"
analysis_con = list_duckdb_tables_and_schemas(database_path)

print("\n" + "="*80 + "\n")
analysis_con = duckdb.connect(database=database_path, read_only=False)
print(f"Successfully connected to DuckDB database: {database_path}\n")

--- Listing tables and schemas for an in-memory database ---
Successfully connected to DuckDB database: ../cve_consolidated.duckdb

--- Tables and Their Schemas ---
--------------------------------

Table: capec_ref
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  capec_id                  VARCHAR         YES        NULL                 
  name                      VARCHAR         YES        NULL                 
  abstraction               VARCHAR         YES        NULL                 
  status                    VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  alternate_terms           VARCHAR         YES        NULL                 
  likelihood_of_attack      VARCHAR         YES        NULL                 
  typical_severity          VARCHAR         YES        NULL                 
  related_attack_patterns   VARCHAR         YES        NULL                 
  execution_flow            VARCHAR         YES        NULL                 
  prerequisites             VARCHAR         YES        NULL                 
  skills_required           VARCHAR         YES        NULL                 
  resources_required        VARCHAR         YES        NULL                 
  indicators                VARCHAR         YES        NULL                 
  consequences              VARCHAR         YES        NULL                 
  mitigations               VARCHAR         YES        NULL                 
  example_instances         VARCHAR         YES        NULL                 
  related_weaknesses        VARCHAR         YES        NULL                 
  taxonomy_mappings         VARCHAR         YES        NULL                 
  notes                     VARCHAR         YES        NULL                 
  created_at                TIMESTAMP WITH TIME ZONE YES        NULL


Table: cisco_patches
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  advisory_id               VARCHAR         YES        NULL                 
  title                     VARCHAR         YES        NULL                 
  cve_id                    VARCHAR         YES        NULL                 
  vulnerability_title       VARCHAR         YES        NULL                 
  current_release_date      TIMESTAMP       YES        NULL                 
  initial_release_date      TIMESTAMP       YES        NULL                 
  vulnerability_release_date TIMESTAMP       YES        NULL                 
  status                    VARCHAR         YES        NULL                 
  version                   VARCHAR         YES        NULL                 
  publisher                 VARCHAR         YES        NULL                 
  publisher_category        VARCHAR         YES        NULL                 
  summary                   VARCHAR         YES        NULL                 
  details                   VARCHAR         YES        NULL                 
  cvss_score                FLOAT           YES        NULL                 
  cvss_severity             VARCHAR         YES        NULL                 
  cvss_vector               VARCHAR         YES        NULL                 
  bug_ids                   VARCHAR         YES        NULL                 
  product_id                VARCHAR         YES        NULL                 
  product_name              VARCHAR         YES        NULL                 
  product_full_path         VARCHAR         YES        NULL                 
  acknowledgments           VARCHAR         YES        NULL                 
  references                VARCHAR         YES        NULL                 
  remediations              VARCHAR         YES        NULL


Table: cve_main
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  cve_id                    VARCHAR         YES        NULL                 
  assigner_org              VARCHAR         YES        NULL                 
  state                     VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  date_reserved             TIMESTAMP       YES        NULL                 
  date_published            TIMESTAMP       YES        NULL                 
  date_updated              TIMESTAMP       YES        NULL                 
  cvss_v2_score             FLOAT           YES        NULL                 
  cvss_v2_vector            VARCHAR         YES        NULL                 
  cvss_v3_score             FLOAT           YES        NULL                 
  cvss_v3_vector            VARCHAR         YES        NULL                 
  cvss_v3_severity          VARCHAR         YES        NULL                 
  cvss_v4_score             FLOAT           YES        NULL                 
  cvss_v4_vector            VARCHAR         YES        NULL                 
  cvss_v4_severity          VARCHAR         YES        NULL                 
  cwe_ids                   VARCHAR         YES        NULL                 
  cpes                      VARCHAR         YES        NULL                 
  vendors                   VARCHAR         YES        NULL                 
  products                  VARCHAR         YES        NULL                 
  references                VARCHAR         YES        NULL                 
  ssvc_exploitation         VARCHAR         YES        NULL                 
  ssvc_automatable          VARCHAR         YES        NULL                 
  ssvc_technical_impact     VARCHAR         YES        NULL                 
  kev_known_exploited       TINYINT         YES        NULL                 
  kev_vendor_project        VARCHAR         YES        NULL                 
  kev_product               VARCHAR         YES        NULL                 
  kev_vulnerability_name    VARCHAR         YES        NULL                 
  kev_date_added            TIMESTAMP       YES        NULL                 
  kev_short_description     VARCHAR         YES        NULL                 
  kev_required_action       VARCHAR         YES        NULL                 
  kev_due_date              TIMESTAMP       YES        NULL                 
  kev_ransomware_use        VARCHAR         YES        NULL                 
  kev_notes                 VARCHAR         YES        NULL                 
  kev_cwes                  VARCHAR         YES        NULL                 
  epss_score                FLOAT           YES        NULL                 
  epss_percentile           FLOAT           YES        NULL                 
  data_sources              VARCHAR         YES        NULL                 
  created_at                TIMESTAMP WITH TIME ZONE YES        NULL                 
  updated_at                TIMESTAMP WITH TIME ZONE YES        NULL                 
  has_exploit               TINYINT         YES        NULL                 
  exploit_count             INTEGER         YES        NULL                 
  first_exploit_date        TIMESTAMP       YES        NULL                 
  latest_exploit_date       TIMESTAMP       YES        NULL


Table: cve_main_old
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  CVE ID                    VARCHAR         YES        NULL                 
  State                     VARCHAR         YES        NULL                 
  Date Published            TIMESTAMP       YES        NULL                 
  Date Updated              TIMESTAMP       YES        NULL                 
  Date Reserved             TIMESTAMP       YES        NULL                 
  Descriptions              VARCHAR         YES        NULL                 
  Affected Products         VARCHAR         YES        NULL                 
  References                VARCHAR         YES        NULL                 
  Problem Types             VARCHAR         YES        NULL                 
  Base Severity             VARCHAR         YES        NULL                 
  Confidentiality Impact    VARCHAR         YES        NULL                 
  Integrity Impact          VARCHAR         YES        NULL                 
  Availability Impact       VARCHAR         YES        NULL                 
  CVSS 2.0 Base Score       FLOAT           YES        NULL                 
  CVSS 3.0 Base Score       FLOAT           YES        NULL                 
  CVSS 3.1 Base Score       FLOAT           YES        NULL                 
  cwe                       VARCHAR         YES        NULL                 
  EPSS                      FLOAT           YES        NULL                 
  vendors                   VARCHAR         YES        NULL                 
  Software CPES             VARCHAR         YES        NULL                 
  V Score                   FLOAT           YES        NULL


Table: cwe_ref
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  cwe_id                    VARCHAR         YES        NULL                 
  name                      VARCHAR         YES        NULL                 
  weakness_abstraction      VARCHAR         YES        NULL                 
  status                    VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  extended_description      VARCHAR         YES        NULL                 
  related_weaknesses        VARCHAR         YES        NULL                 
  weakness_ordinalities     VARCHAR         YES        NULL                 
  applicable_platforms      VARCHAR         YES        NULL                 
  background_details        VARCHAR         YES        NULL                 
  alternate_terms           VARCHAR         YES        NULL                 
  modes_of_introduction     VARCHAR         YES        NULL                 
  exploitation_factors      VARCHAR         YES        NULL                 
  likelihood_of_exploit     VARCHAR         YES        NULL                 
  common_consequences       VARCHAR         YES        NULL                 
  detection_methods         VARCHAR         YES        NULL                 
  potential_mitigations     VARCHAR         YES        NULL                 
  observed_examples         VARCHAR         YES        NULL                 
  functional_areas          VARCHAR         YES        NULL                 
  affected_resources        VARCHAR         YES        NULL                 
  taxonomy_mappings         VARCHAR         YES        NULL                 
  related_attack_patterns   VARCHAR         YES        NULL                 
  notes                     VARCHAR         YES        NULL                 
  created_at                TIMESTAMP WITH TIME ZONE YES        NULL


Table: exploits
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  file                      VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  date_published            TIMESTAMP       YES        NULL                 
  author                    VARCHAR         YES        NULL                 
  type                      VARCHAR         YES        NULL                 
  platform                  VARCHAR         YES        NULL                 
  port                      DOUBLE          YES        NULL                 
  date_added                TIMESTAMP       YES        NULL                 
  date_updated              TIMESTAMP       YES        NULL                 
  verified                  BIGINT          YES        NULL                 
  codes                     VARCHAR         YES        NULL                 
  tags                      VARCHAR         YES        NULL                 
  aliases                   VARCHAR         YES        NULL                 
  screenshot_url            VARCHAR         YES        NULL                 
  application_url           VARCHAR         YES        NULL                 
  source_url                VARCHAR         YES        NULL                 
  cve_id                    VARCHAR         YES        NULL


Table: exploits_old
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  exploit_id                BIGINT          YES        NULL                 
  file                      VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  date_published            TIMESTAMP       YES        NULL                 
  author                    VARCHAR         YES        NULL                 
  type                      VARCHAR         YES        NULL                 
  platform                  VARCHAR         YES        NULL                 
  port                      DOUBLE          YES        NULL                 
  date_added                TIMESTAMP       YES        NULL                 
  date_updated              TIMESTAMP       YES        NULL                 
  verified                  BIGINT          YES        NULL                 
  codes                     VARCHAR         YES        NULL                 
  tags                      VARCHAR         YES        NULL                 
  aliases                   VARCHAR         YES        NULL                 
  screenshot_url            VARCHAR         YES        NULL                 
  application_url           VARCHAR         YES        NULL                 
  source_url                VARCHAR         YES        NULL                 
  cve_id                    VARCHAR         YES        NULL


Table: github_advisories
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  ghsa_id                   VARCHAR         YES        NULL                 
  schema_version            VARCHAR         YES        NULL                 
  published                 TIMESTAMP       YES        NULL                 
  modified                  TIMESTAMP       YES        NULL                 
  summary                   VARCHAR         YES        NULL                 
  details                   VARCHAR         YES        NULL                 
  primary_cve               VARCHAR         YES        NULL                 
  all_cves                  VARCHAR         YES        NULL                 
  cvss_v3_score             FLOAT           YES        NULL                 
  cvss_v3_vector            VARCHAR         YES        NULL                 
  cvss_v4_score             FLOAT           YES        NULL                 
  cvss_v4_vector            VARCHAR         YES        NULL                 
  database_severity         VARCHAR         YES        NULL                 
  severity_score            FLOAT           YES        NULL                 
  cwe_ids                   VARCHAR         YES        NULL                 
  github_reviewed           BOOLEAN         YES        NULL                 
  github_reviewed_at        TIMESTAMP       YES        NULL                 
  nvd_published_at          TIMESTAMP       YES        NULL                 
  exploited                 TINYINT         YES        NULL                 
  exploitability_level      TINYINT         YES        NULL                 
  poc_available             TINYINT         YES        NULL                 
  patched                   TINYINT         YES        NULL                 
  patch_available           TINYINT         YES        NULL                 
  primary_ecosystem         VARCHAR         YES        NULL                 
  all_ecosystems            VARCHAR         YES        NULL                 
  package_ecosystem         VARCHAR         YES        NULL                 
  package_name              VARCHAR         YES        NULL                 
  package_purl              VARCHAR         YES        NULL                 
  references                VARCHAR         YES        NULL                 
  affected_ranges           VARCHAR         YES        NULL                 
  affected_versions         VARCHAR         YES        NULL                 
  created_at                TIMESTAMP WITH TIME ZONE YES        NULL                 
  updated_at                TIMESTAMP WITH TIME ZONE YES        NULL


Table: morefixes_commits
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  hash                      VARCHAR         YES        NULL                 
  repo_url                  VARCHAR         YES        NULL                 
  author                    VARCHAR         YES        NULL                 
  committer                 VARCHAR         YES        NULL                 
  msg                       VARCHAR         YES        NULL                 
  parents                   VARCHAR         YES        NULL                 
  author_timezone           BIGINT          YES        NULL                 
  num_lines_added           BIGINT          YES        NULL                 
  num_lines_deleted         BIGINT          YES        NULL                 
  dmm_unit_complexity       DOUBLE          YES        NULL                 
  dmm_unit_interfacing      DOUBLE          YES        NULL                 
  dmm_unit_size             DOUBLE          YES        NULL                 
  merge                     BOOLEAN         YES        NULL                 
  committer_timezone        BIGINT          YES        NULL                 
  author_date               TIMESTAMP WITH TIME ZONE YES        NULL                 
  committer_date            TIMESTAMP WITH TIME ZONE YES        NULL


Table: morefixes_cve
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  cve_id                    VARCHAR         YES        NULL                 
  published_date            VARCHAR         YES        NULL                 
  last_modified_date        VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  nodes                     VARCHAR         YES        NULL                 
  severity                  VARCHAR         YES        NULL                 
  obtain_all_privilege      VARCHAR         YES        NULL                 
  obtain_user_privilege     VARCHAR         YES        NULL                 
  obtain_other_privilege    VARCHAR         YES        NULL                 
  user_interaction_required VARCHAR         YES        NULL                 
  cvss2_vector_string       VARCHAR         YES        NULL                 
  cvss2_access_vector       VARCHAR         YES        NULL                 
  cvss2_access_complexity   VARCHAR         YES        NULL                 
  cvss2_authentication      VARCHAR         YES        NULL                 
  cvss2_confidentiality_impact VARCHAR         YES        NULL                 
  cvss2_integrity_impact    VARCHAR         YES        NULL                 
  cvss2_availability_impact VARCHAR         YES        NULL                 
  cvss2_base_score          VARCHAR         YES        NULL                 
  cvss3_vector_string       VARCHAR         YES        NULL                 
  cvss3_attack_vector       VARCHAR         YES        NULL                 
  cvss3_attack_complexity   VARCHAR         YES        NULL                 
  cvss3_privileges_required VARCHAR         YES        NULL                 
  cvss3_user_interaction    VARCHAR         YES        NULL                 
  cvss3_scope               VARCHAR         YES        NULL                 
  cvss3_confidentiality_impact VARCHAR         YES        NULL                 
  cvss3_integrity_impact    VARCHAR         YES        NULL                 
  cvss3_availability_impact VARCHAR         YES        NULL                 
  cvss3_base_score          VARCHAR         YES        NULL                 
  cvss3_base_severity       VARCHAR         YES        NULL                 
  exploitability_score      VARCHAR         YES        NULL                 
  impact_score              VARCHAR         YES        NULL                 
  ac_insuf_info             VARCHAR         YES        NULL                 
  reference_json            VARCHAR         YES        NULL                 
  problemtype_json          VARCHAR         YES        NULL


Table: morefixes_cve_project
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        INTEGER         YES        NULL                 
  cve                       VARCHAR         YES        NULL                 
  project_url               VARCHAR         YES        NULL                 
  rel_type                  VARCHAR         YES        NULL                 
  checked                   VARCHAR         YES        NULL


Table: morefixes_cwe
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  index                     BIGINT          YES        NULL                 
  cwe_id                    VARCHAR         YES        NULL                 
  cwe_name                  VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  extended_description      VARCHAR         YES        NULL                 
  url                       VARCHAR         YES        NULL                 
  is_category               BOOLEAN         YES        NULL


Table: morefixes_cwe_classification
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  cve_id                    VARCHAR         YES        NULL                 
  cwe_id                    VARCHAR         YES        NULL


Table: morefixes_file_change
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  file_change_id            VARCHAR         YES        NULL                 
  hash                      VARCHAR         YES        NULL                 
  filename                  VARCHAR         YES        NULL                 
  old_path                  VARCHAR         YES        NULL                 
  new_path                  VARCHAR         YES        NULL                 
  change_type               VARCHAR         YES        NULL                 
  diff                      VARCHAR         YES        NULL                 
  diff_parsed               VARCHAR         YES        NULL                 
  num_lines_added           INTEGER         YES        NULL                 
  num_lines_deleted         INTEGER         YES        NULL                 
  code_after                VARCHAR         YES        NULL                 
  code_before               VARCHAR         YES        NULL                 
  nloc                      VARCHAR         YES        NULL                 
  complexity                VARCHAR         YES        NULL                 
  token_count               VARCHAR         YES        NULL                 
  programming_language      VARCHAR         YES        NULL


Table: morefixes_fixes
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  cve_id                    VARCHAR         YES        NULL                 
  hash                      VARCHAR         YES        NULL                 
  repo_url                  VARCHAR         YES        NULL                 
  rel_type                  VARCHAR         YES        NULL                 
  score                     BIGINT          YES        NULL                 
  extraction_status         VARCHAR         YES        NULL


Table: morefixes_method_change
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  method_change_id          VARCHAR         YES        NULL                 
  file_change_id            VARCHAR         YES        NULL                 
  name                      VARCHAR         YES        NULL                 
  signature                 VARCHAR         YES        NULL                 
  parameters                VARCHAR         YES        NULL                 
  start_line                VARCHAR         YES        NULL                 
  end_line                  VARCHAR         YES        NULL                 
  code                      VARCHAR         YES        NULL                 
  nloc                      VARCHAR         YES        NULL                 
  complexity                VARCHAR         YES        NULL                 
  token_count               VARCHAR         YES        NULL                 
  top_nesting_level         VARCHAR         YES        NULL                 
  before_change             VARCHAR         YES        NULL


Table: morefixes_repository
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  repo_url                  VARCHAR         YES        NULL                 
  repo_name                 VARCHAR         YES        NULL                 
  description               VARCHAR         YES        NULL                 
  date_created              TIMESTAMP       YES        NULL                 
  date_last_push            TIMESTAMP       YES        NULL                 
  homepage                  VARCHAR         YES        NULL                 
  repo_language             VARCHAR         YES        NULL                 
  owner                     VARCHAR         YES        NULL                 
  forks_count               BIGINT          YES        NULL                 
  stars_count               BIGINT          YES        NULL


Table: msrc_patches
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  title                     VARCHAR         YES        NULL                 
  release_date              TIMESTAMP       YES        NULL                 
  initial_release_date      TIMESTAMP       YES        NULL                 
  cvrf_id                   VARCHAR         YES        NULL                 
  cve_id                    VARCHAR         YES        NULL                 
  exploited_status          INTEGER         YES        NULL                 
  exploitation_potential_lsr INTEGER         YES        NULL                 
  exploitation_potential_osr INTEGER         YES        NULL                 
  publicly_disclosed        INTEGER         YES        NULL                 
  cvss_score                FLOAT           YES        NULL                 
  cvss_vector               VARCHAR         YES        NULL                 
  vuln_title                VARCHAR         YES        NULL                 
  product_id                VARCHAR         YES        NULL                 
  product_name              VARCHAR         YES        NULL                 
  product_branch            VARCHAR         YES        NULL                 
  product_cpe               VARCHAR         YES        NULL                 
  threats                   VARCHAR         YES        NULL                 
  remediations              VARCHAR         YES        NULL                 
  cwe_ids                   VARCHAR         YES        NULL                 
  notes                     VARCHAR         YES        NULL                 
  acknowledgments           VARCHAR         YES        NULL


Table: redhat_patches
Schema:
  Column Name               Data Type       Nullable   Default         PK   
  ------------------------- --------------- ---------- --------------- -----
  id                        BIGINT          YES        NULL                 
  advisory_id               VARCHAR         YES        NULL                 
  title                     VARCHAR         YES        NULL                 
  cve_id                    VARCHAR         YES        NULL                 
  cwe_id                    VARCHAR         YES        NULL                 
  vulnerability_title       VARCHAR         YES        NULL                 
  current_release_date      TIMESTAMP       YES        NULL                 
  initial_release_date      TIMESTAMP       YES        NULL                 
  discovery_date            TIMESTAMP       YES        NULL                 
  release_date              TIMESTAMP       YES        NULL                 
  status                    VARCHAR         YES        NULL                 
  version                   VARCHAR         YES        NULL                 
  publisher                 VARCHAR         YES        NULL                 
  publisher_category        VARCHAR         YES        NULL                 
  summary                   VARCHAR         YES        NULL                 
  details                   VARCHAR         YES        NULL                 
  cvss_score                FLOAT           YES        NULL                 
  cvss_severity             VARCHAR         YES        NULL                 
  cvss_vector               VARCHAR         YES        NULL                 
  threat_impact             VARCHAR         YES        NULL                 
  aggregate_severity        VARCHAR         YES        NULL                 
  product_id                VARCHAR         YES        NULL                 
  product_name              VARCHAR         YES        NULL



================================================================================

Successfully connected to DuckDB database: ../cve_consolidated.duckdb

def analyze_cve_exploit_changes_comprehensive():
    """
    COMPREHENSIVE analysis comparing CVE-exploit distributions between old and new versions
    Uses corrected methodology accounting for schema differences (exploit_id vs id)
    Focuses on exploits with CVE assignments while also tracking non-CVE exploits
    """

    print("=== COMPREHENSIVE CVE-Exploit Distribution Analysis (CORRECTED) ===")
    print("Schema: Old table uses 'exploit_id', New table uses 'id'")
    print("Focus: Exploits with CVE assignments (also tracking non-CVE exploits)")
    print("="*80)

    # 1. Get CVE exploit distribution from both tables
    print("\n1. Analyzing CVE-Exploit Distribution Changes...")

    # Current exploits table distribution (WITH CVE)
    current_distribution_query = """
    WITH cve_exploit_counts AS (
        SELECT 
            cve_id,
            COUNT(*) as exploit_count
        FROM exploits 
        WHERE cve_id IS NOT NULL AND cve_id != ''
        GROUP BY cve_id
    ),
    exploit_categories AS (
        SELECT 
            CASE 
                WHEN exploit_count = 1 THEN '1 Exploit'
                WHEN exploit_count = 2 THEN '2 Exploits'
                WHEN exploit_count = 3 THEN '3 Exploits'
                WHEN exploit_count BETWEEN 4 AND 5 THEN '4-5 Exploits'
                WHEN exploit_count BETWEEN 6 AND 10 THEN '6-10 Exploits'
                WHEN exploit_count BETWEEN 11 AND 15 THEN '11-15 Exploits'
                ELSE '16+ Exploits'
            END as exploit_category,
            exploit_count,
            cve_id
        FROM cve_exploit_counts
    )
    SELECT 
        exploit_category,
        COUNT(*) as cve_count,
        SUM(exploit_count) as total_exploits
    FROM exploit_categories
    GROUP BY exploit_category
    ORDER BY 
        CASE exploit_category
            WHEN '1 Exploit' THEN 1
            WHEN '2 Exploits' THEN 2
            WHEN '3 Exploits' THEN 3
            WHEN '4-5 Exploits' THEN 4
            WHEN '6-10 Exploits' THEN 5
            WHEN '11-15 Exploits' THEN 6
            ELSE 7
        END
    """

    # Old exploits table distribution (WITH CVE)
    old_distribution_query = """
    WITH cve_exploit_counts AS (
        SELECT 
            cve_id,
            COUNT(*) as exploit_count
        FROM exploits_old 
        WHERE cve_id IS NOT NULL AND cve_id != '' AND cve_id != 'nan'
        GROUP BY cve_id
    ),
    exploit_categories AS (
        SELECT 
            CASE 
                WHEN exploit_count = 1 THEN '1 Exploit'
                WHEN exploit_count = 2 THEN '2 Exploits'
                WHEN exploit_count = 3 THEN '3 Exploits'
                WHEN exploit_count BETWEEN 4 AND 5 THEN '4-5 Exploits'
                WHEN exploit_count BETWEEN 6 AND 10 THEN '6-10 Exploits'
                WHEN exploit_count BETWEEN 11 AND 15 THEN '11-15 Exploits'
                ELSE '16+ Exploits'
            END as exploit_category,
            exploit_count,
            cve_id
        FROM cve_exploit_counts
    )
    SELECT 
        exploit_category,
        COUNT(*) as cve_count,
        SUM(exploit_count) as total_exploits
    FROM exploit_categories
    GROUP BY exploit_category
    ORDER BY 
        CASE exploit_category
            WHEN '1 Exploit' THEN 1
            WHEN '2 Exploits' THEN 2
            WHEN '3 Exploits' THEN 3
            WHEN '4-5 Exploits' THEN 4
            WHEN '6-10 Exploits' THEN 5
            WHEN '11-15 Exploits' THEN 6
            ELSE 7
        END
    """

    current_dist = analysis_con.sql(current_distribution_query).df()
    current_dist = pd.DataFrame(current_dist)
    old_dist = analysis_con.sql(old_distribution_query).df()
    old_dist = pd.DataFrame(old_dist)

    # 2. Compare distributions side by side
    comparison_df = pd.merge(
        current_dist, old_dist, 
        on='exploit_category', 
        how='outer', 
        suffixes=('_current', '_old')
    ).fillna(0)

    comparison_df['cve_difference'] = comparison_df['cve_count_current'] - comparison_df['cve_count_old']
    comparison_df['exploit_difference'] = comparison_df['total_exploits_current'] - comparison_df['total_exploits_old']
    comparison_df['percentage_change'] = (comparison_df['cve_difference'] / comparison_df['cve_count_old'].replace(0, 1)) * 100

    print("CVE-Exploit Distribution Comparison (EXPLOITS WITH CVE ONLY):")
    print("="*140)
    print(f"{'Category':<15} {'Old CVEs':<10} {'Current CVEs':<12} {'CVE Diff':<10} {'Old Exploits':<15} {'Current Exploits':<18} {'% Change':<10}")
    print(f"{'':15} {'':10} {'':12} {'':10} {'(With CVE)':<15} {'(With CVE)':<18} {'':10}")
    print("="*140)

    for _, row in comparison_df.iterrows():
        print(f"{row['exploit_category']:<15} {int(row['cve_count_old']):<10} {int(row['cve_count_current']):<12} "
              f"{int(row['cve_difference']):+<10} {int(row['total_exploits_old']):<15} {int(row['total_exploits_current']):<18} "
              f"{row['percentage_change']:+.1f}%")

    # 3. Find CVEs that changed categories (lost or gained exploits) - CORRECTED METHODOLOGY
    print("\n\n2. Identifying CVEs with Changed Exploit Counts (CORRECTED ANALYSIS)...")

    # CVEs that lost or gained exploits (by CVE count, not file-level changes)
    cve_count_changes_query = """
    WITH old_counts AS (
        SELECT cve_id, COUNT(*) as old_count
        FROM exploits_old 
        WHERE cve_id IS NOT NULL AND cve_id != '' AND cve_id != 'nan'
        GROUP BY cve_id
    ),
    current_counts AS (
        SELECT cve_id, COUNT(*) as current_count
        FROM exploits 
        WHERE cve_id IS NOT NULL AND cve_id != ''
        GROUP BY cve_id
    )
    SELECT 
        COALESCE(o.cve_id, c.cve_id) as cve_id,
        COALESCE(o.old_count, 0) as old_count,
        COALESCE(c.current_count, 0) as current_count,
        COALESCE(c.current_count, 0) - COALESCE(o.old_count, 0) as change
    FROM old_counts o
    FULL OUTER JOIN current_counts c ON o.cve_id = c.cve_id
    WHERE COALESCE(c.current_count, 0) != COALESCE(o.old_count, 0)
    ORDER BY change DESC
    """

    exploit_changes = analysis_con.sql(cve_count_changes_query).df()

    # CVEs that gained exploits
    gained_exploits = exploit_changes[exploit_changes['change'] > 0].head(10)
    lost_exploits = exploit_changes[exploit_changes['change'] < 0].head(10)

    print("Top 10 CVEs that GAINED exploits:")
    print("-" * 60)
    print(f"{'CVE ID':<15} {'Old Count':<10} {'New Count':<10} {'Change':<8}")
    print("-" * 60)
    for _, row in gained_exploits.iterrows():
        print(f"{row['cve_id']:<15} {int(row['old_count']):<10} {int(row['current_count']):<10} {int(row['change']):+<8}")

    print("\nTop 10 CVEs that LOST exploits:")
    print("-" * 60)
    print(f"{'CVE ID':<15} {'Old Count':<10} {'New Count':<10} {'Change':<8}")
    print("-" * 60)
    for _, row in lost_exploits.iterrows():
        print(f"{row['cve_id']:<15} {int(row['old_count']):<10} {int(row['current_count']):<10} {int(row['change']):+<8}")

    # 4. CORRECTED: Analyze specific file-level changes (using proper methodology)
    print("\n\n3. File-Level Change Analysis (CORRECTED METHODOLOGY)...")

    # Method 1: Files removed (exist in old, not in new) - BY FILENAME
    removed_files_query = """
    SELECT 
        eo.exploit_id,
        eo.cve_id,
        eo.file,
        eo.description,
        eo.author,
        eo.type,
        eo.platform,
        eo.date_published
    FROM exploits_old eo
    LEFT JOIN exploits e ON eo.file = e.file
    WHERE e.file IS NULL 
    ORDER BY eo.file
    """

    # Method 2: Files added (exist in new, not in old) - BY FILENAME
    added_files_query = """
    SELECT 
        e.id,
        e.cve_id,
        e.file,
        e.description,
        e.author,
        e.type,
        e.platform,
        e.date_published
    FROM exploits e
    LEFT JOIN exploits_old eo ON e.file = eo.file
    WHERE eo.file IS NULL OR eo.file = 'nan'
    ORDER BY e.file
    """

    # Method 3: CVE assignment changes for same files
    cve_assignment_changes_query = """
    SELECT 
        e.file,
        eo.exploit_id as old_id,
        e.id as new_id,
        eo.cve_id as old_cve,
        e.cve_id as new_cve,
        eo.author as old_author,
        e.author as new_author
    FROM exploits_old eo
    INNER JOIN exploits e ON eo.file = e.file
    WHERE COALESCE(NULLIF(eo.cve_id, 'nan'), '') != COALESCE(e.cve_id, '')
    ORDER BY e.file
    """

    removed_files = analysis_con.sql(removed_files_query).df()
    added_files = analysis_con.sql(added_files_query).df()
    cve_assignment_changes = analysis_con.sql(cve_assignment_changes_query).df()

    # Separate by CVE status
    removed_files_with_cve = removed_files[
        (removed_files['cve_id'].notna()) & (removed_files['cve_id'] != '') & (removed_files['cve_id'] != 'nan')
    ]
    removed_files_without_cve = removed_files[
        (removed_files['cve_id'].isna()) | (removed_files['cve_id'] == '') & (removed_files['cve_id'] != 'nan')
    ]

    added_files_with_cve = added_files[
        (added_files['cve_id'].notna()) & (added_files['cve_id'] != '')
    ]
    added_files_without_cve = added_files[
        (added_files['cve_id'].isna()) | (added_files['cve_id'] == '')
    ]

    print("File-Level Changes Summary:")
    print("-" * 60)
    print(f"Files REMOVED (with CVE): {len(removed_files_with_cve):,}")
    print(f"Files REMOVED (without CVE): {len(removed_files_without_cve):,}")
    print(f"Files ADDED (with CVE): {len(added_files_with_cve):,}")
    print(f"Files ADDED (without CVE): {len(added_files_without_cve):,}")
    print(f"CVE assignment changes (same files): {len(cve_assignment_changes):,}")
    print()
    print(f"Net file change (with CVE): {len(added_files_with_cve) - len(removed_files_with_cve):+,}")
    print(f"Net file change (without CVE): {len(added_files_without_cve) - len(removed_files_without_cve):+,}")
    print(f"Total net file change: {len(added_files) - len(removed_files):+,}")

    # Show examples of removed/added files WITH CVE
    print("\nExamples of REMOVED files (WITH CVE):")
    print("-" * 80)
    for _, row in removed_files_with_cve.head(5).iterrows():
        print(f"  - {row['file']} (CVE: {row['cve_id']}) ({row['type']}) by {row['author']}")

    print("\nExamples of ADDED files (WITH CVE):")
    print("-" * 80)
    for _, row in added_files_with_cve.head(5).iterrows():
        print(f"  + {row['file']} (CVE: {row['cve_id']}) ({row['type']}) by {row['author']}")

    # Show examples of CVE assignment changes
    if len(cve_assignment_changes) > 0:
        print("\nExamples of CVE Assignment Changes (Same Files):")
        print("-" * 80)
        print(f"{'File':<30} {'Old CVE':<15} {'New CVE':<15}")
        print("-" * 80)
        for _, row in cve_assignment_changes.head(5).iterrows():
            # Fix the 'nan' display issue
            old_cve = row['old_cve'] if pd.notna(row['old_cve']) and row['old_cve'] != '' and row['old_cve'] != 'nan' else 'None'
            new_cve = row['new_cve'] if pd.notna(row['new_cve']) and row['new_cve'] != '' else 'None'
            file_short = row['file'][:28] + '...' if len(str(row['file'])) > 30 else str(row['file'])
            print(f"{file_short:<30} {old_cve:<15} {new_cve:<15}")

    # 5. Comprehensive database statistics
    print("\n\n4. Comprehensive Database Statistics...")

    # Get total counts for both WITH and WITHOUT CVE
    database_stats_query = """
    WITH old_stats AS (
        SELECT 
            COUNT(*) as total_exploits,
            COUNT(CASE WHEN cve_id IS NOT NULL AND cve_id != '' AND cve_id != 'nan' THEN 1 END) as exploits_with_cve,
            COUNT(CASE WHEN cve_id IS NULL OR cve_id = '' OR cve_id = 'nan' THEN 1 END) as exploits_without_cve,
            COUNT(DISTINCT CASE WHEN cve_id IS NOT NULL AND cve_id != '' AND cve_id != 'nan' THEN cve_id END) as unique_cves
        FROM exploits_old
    ),
    current_stats AS (
        SELECT 
            COUNT(*) as total_exploits,
            COUNT(CASE WHEN cve_id IS NOT NULL AND cve_id != '' THEN 1 END) as exploits_with_cve,
            COUNT(CASE WHEN cve_id IS NULL OR cve_id = '' THEN 1 END) as exploits_without_cve,
            COUNT(DISTINCT CASE WHEN cve_id IS NOT NULL AND cve_id != '' THEN cve_id END) as unique_cves
        FROM exploits
    )
    SELECT 
        'Old Database' as version,
        o.total_exploits,
        o.exploits_with_cve,
        o.exploits_without_cve,
        o.unique_cves
    FROM old_stats o

    UNION ALL

    SELECT 
        'Current Database' as version,
        c.total_exploits,
        c.exploits_with_cve,
        c.exploits_without_cve,
        c.unique_cves
    FROM current_stats c
    """

    db_stats = analysis_con.sql(database_stats_query).df()

    print("Database Statistics Comparison:")
    print("="*90)
    print(f"{'Version':<18} {'Total':<10} {'With CVE':<12} {'Without CVE':<14} {'Unique CVEs':<12}")
    print(f"{'':18} {'Exploits':<10} {'':12} {'':14} {'':12}")
    print("="*90)

    old_stats = db_stats[db_stats['version'] == 'Old Database'].iloc[0]
    current_stats = db_stats[db_stats['version'] == 'Current Database'].iloc[0]

    print(f"{'Old Database':<18} {old_stats['total_exploits']:<10,} {old_stats['exploits_with_cve']:<12,} {old_stats['exploits_without_cve']:<14,} {old_stats['unique_cves']:<12,}")
    print(f"{'Current Database':<18} {current_stats['total_exploits']:<10,} {current_stats['exploits_with_cve']:<12,} {current_stats['exploits_without_cve']:<14,} {current_stats['unique_cves']:<12,}")

    # Calculate changes
    print(f"{'Net Change':<18} {current_stats['total_exploits'] - old_stats['total_exploits']:+<10,} {current_stats['exploits_with_cve'] - old_stats['exploits_with_cve']:+<12,} {current_stats['exploits_without_cve'] - old_stats['exploits_without_cve']:+<14,} {current_stats['unique_cves'] - old_stats['unique_cves']:+<12,}")

    # Calculate percentages
    print(f"\nPercentage with CVE (Old): {(old_stats['exploits_with_cve']/old_stats['total_exploits'])*100:.1f}%")
    print(f"Percentage with CVE (Current): {(current_stats['exploits_with_cve']/current_stats['total_exploits'])*100:.1f}%")

    # 6. Create comprehensive visualization
    print("\n\n5. Creating Comprehensive Visualization...")

    plt.figure(figsize=(16, 12))

    # Main comparison chart (CVE distribution)
    plt.subplot(2, 3, 1)
    x = np.arange(len(comparison_df))
    width = 0.35

    plt.bar(x - width/2, comparison_df['cve_count_old'], width, 
            label='Old (April 2024)', alpha=0.8, color='lightcoral')
    plt.bar(x + width/2, comparison_df['cve_count_current'], width, 
            label='Current (May 2025)', alpha=0.8, color='skyblue')

    plt.xlabel('Exploit Categories')
    plt.ylabel('Number of CVEs')
    plt.title('CVE Distribution Comparison\n(Exploits WITH CVE Only)')
    plt.xticks(x, comparison_df['exploit_category'], rotation=45, ha='right')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)

    # Net change chart
    plt.subplot(2, 3, 2)
    colors = ['red' if x < 0 else 'green' for x in comparison_df['cve_difference']]
    plt.bar(comparison_df['exploit_category'], comparison_df['cve_difference'], 
            color=colors, alpha=0.7)
    plt.xlabel('Exploit Categories')
    plt.ylabel('CVE Count Change')
    plt.title('Net Change in CVE Counts')
    plt.xticks(rotation=45, ha='right')
    plt.grid(axis='y', alpha=0.3)
    plt.axhline(y=0, color='black', linestyle='-', linewidth=0.5)

    # File-level changes pie chart
    plt.subplot(2, 3, 3)
    file_change_labels = ['Added (with CVE)', 'Added (no CVE)', 'Removed (with CVE)', 'Removed (no CVE)']
    file_change_values = [
        len(added_files_with_cve),
        len(added_files_without_cve), 
        len(removed_files_with_cve),
        len(removed_files_without_cve)
    ]
    colors_pie = ['lightgreen', 'lightblue', 'lightcoral', 'lightyellow']

    plt.pie(file_change_values, labels=file_change_labels, colors=colors_pie, 
            autopct='%1.1f%%', startangle=90)
    plt.title('File-Level Changes\nBreakdown')

    # Top CVEs that gained exploits
    plt.subplot(2, 3, 4)
    if len(gained_exploits) > 0:
        max_label_length = 15
        truncated_labels = [
            (cve[:max_label_length] + '...') if len(cve) > max_label_length else cve
            for cve in gained_exploits['cve_id']
        ]

        plt.barh(range(len(gained_exploits)), gained_exploits['change'], 
                 color='green', alpha=0.7)
        plt.yticks(range(len(gained_exploits)), truncated_labels)
        plt.xlabel('Exploits Gained')
        plt.title('Top CVEs That Gained Exploits')
        plt.grid(axis='x', alpha=0.3)

    # Top CVEs that lost exploits
    plt.subplot(2, 3, 5)
    if len(lost_exploits) > 0:
        plt.barh(range(len(lost_exploits)), abs(lost_exploits['change']), 
                color='red', alpha=0.7)
        plt.yticks(range(len(lost_exploits)), lost_exploits['cve_id'])
        plt.xlabel('Exploits Lost')
        plt.title('Top CVEs That Lost Exploits')
        plt.grid(axis='x', alpha=0.3)

    # Database statistics comparison
    plt.subplot(2, 3, 6)
    categories = ['Total Exploits', 'With CVE', 'Without CVE', 'Unique CVEs']
    old_values = [old_stats['total_exploits'], old_stats['exploits_with_cve'], 
                  old_stats['exploits_without_cve'], old_stats['unique_cves']]
    current_values = [current_stats['total_exploits'], current_stats['exploits_with_cve'], 
                      current_stats['exploits_without_cve'], current_stats['unique_cves']]

    x_pos = np.arange(len(categories))
    width = 0.35

    plt.bar(x_pos - width/2, old_values, width, label='Old', alpha=0.8, color='lightcoral')
    plt.bar(x_pos + width/2, current_values, width, label='Current', alpha=0.8, color='skyblue')

    plt.xlabel('Categories')
    plt.ylabel('Count')
    plt.title('Database Statistics\nComparison')
    plt.xticks(x_pos, categories, rotation=45, ha='right')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)

    plt.tight_layout()
    plt.savefig('figures/comprehensive_cve_exploit_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    # 7. Summary insights
    print("\n\n6. Key Insights Summary:")
    print("="*80)

    total_old_cves = comparison_df['cve_count_old'].sum()
    total_current_cves = comparison_df['cve_count_current'].sum()

    print("📊 CVE-EXPLOIT DISTRIBUTION (Primary Focus):")
    print(f"   • CVEs with exploits (Old): {int(total_old_cves):,}")
    print(f"   • CVEs with exploits (Current): {int(total_current_cves):,}")
    print(f"   • Net change in CVEs with exploits: {int(total_current_cves - total_old_cves):+,}")

    print("\n📁 FILE-LEVEL CHANGES:")
    print(f"   • Files removed (with CVE): {len(removed_files_with_cve):,}")
    print(f"   • Files added (with CVE): {len(added_files_with_cve):,}")
    print(f"   • Files removed (without CVE): {len(removed_files_without_cve):,}")
    print(f"   • Files added (without CVE): {len(added_files_without_cve):,}")
    print(f"   • CVE assignment changes: {len(cve_assignment_changes):,}")

    print("\n🔄 DATA QUALITY IMPROVEMENTS:")
    cve_percentage_old = (old_stats['exploits_with_cve']/old_stats['total_exploits'])*100
    cve_percentage_current = (current_stats['exploits_with_cve']/current_stats['total_exploits'])*100
    print(f"   • CVE assignment rate (Old): {cve_percentage_old:.1f}%")
    print(f"   • CVE assignment rate (Current): {cve_percentage_current:.1f}%")
    print(f"   • Change in CVE assignment quality: {cve_percentage_current - cve_percentage_old:+.1f}%")

    print("\n💡 INTERPRETATION:")
    if len(cve_assignment_changes) > len(added_files_with_cve):
        print("   • Major focus on data quality: CVE assignment corrections exceed new additions")
    if len(removed_files_without_cve) > len(removed_files_with_cve):
        print("   • Database cleanup: More non-CVE exploits removed than CVE-linked ones")
    if total_current_cves > total_old_cves:
        print("   • Growing CVE coverage: Net increase in unique CVEs with exploit evidence")

    # Return comprehensive data
    return {
        'comparison_df': comparison_df,
        'exploit_changes': exploit_changes,
        'removed_files_with_cve': removed_files_with_cve,
        'removed_files_without_cve': removed_files_without_cve,
        'added_files_with_cve': added_files_with_cve,
        'added_files_without_cve': added_files_without_cve,
        'cve_assignment_changes': cve_assignment_changes,
        'database_stats': db_stats
    }

# Execute the comprehensive analysis
print("🚀 Running COMPREHENSIVE ExploitDB Evolution Analysis...")
comprehensive_results = analyze_cve_exploit_changes_comprehensive()

🚀 Running COMPREHENSIVE ExploitDB Evolution Analysis...
=== COMPREHENSIVE CVE-Exploit Distribution Analysis (CORRECTED) ===
Schema: Old table uses 'exploit_id', New table uses 'id'
Focus: Exploits with CVE assignments (also tracking non-CVE exploits)
================================================================================

1. Analyzing CVE-Exploit Distribution Changes...
CVE-Exploit Distribution Comparison (EXPLOITS WITH CVE ONLY):
============================================================================================================================================
Category        Old CVEs   Current CVEs CVE Diff   Old Exploits    Current Exploits   % Change  
                                                   (With CVE)      (With CVE)                   
============================================================================================================================================
1 Exploit       19392      19806        414+++++++ 19392           19806              +2.1%
11-15 Exploits  18         18           0+++++++++ 218             216                +0.0%
16+ Exploits    8          8            0+++++++++ 216             215                +0.0%
2 Exploits      1837       1777         -60+++++++ 3674            3554               -3.3%
3 Exploits      498        476          -22+++++++ 1494            1428               -4.4%
4-5 Exploits    271        267          -4++++++++ 1154            1138               -1.5%
6-10 Exploits   106        103          -3++++++++ 759             736                -2.8%


2. Identifying CVEs with Changed Exploit Counts (CORRECTED ANALYSIS)...
Top 10 CVEs that GAINED exploits:
------------------------------------------------------------
CVE ID          Old Count  New Count  Change  
------------------------------------------------------------
CVE-2014-7910, CVE-2014-7227, CVE-2014-7196, CVE-2014-7169, CVE-2014-62771, CVE-2014-6271, CVE-2014-3671, CVE-2014-3659 0          12         12++++++
CVE-2014-6352, CVE-2014-4114 0          6          6+++++++
CVE-2011-4275, CVE-2009-4140 0          6          6+++++++
CVE-2007-1765, CVE-2007-0038 0          6          6+++++++
CVE-2004-2513, CVE-2004-1211 0          5          5+++++++
CVE-2007-0038, CVE-2007-1765 0          5          5+++++++
CVE-2018-8736, CVE-2018-8735, CVE-2018-8734, CVE-2018-8733 0          5          5+++++++
CVE-2009-1330, CVE-2009-1329, CVE-2009-1328, CVE-2009-1327, CVE-2009-1326, CVE-2009-1325, CVE-2009-1324 0          5          5+++++++
CVE-2014-0346, CVE-2014-0160 0          4          4+++++++
CVE-2008-0624, CVE-2008-0623 0          4          4+++++++

Top 10 CVEs that LOST exploits:
------------------------------------------------------------
CVE ID          Old Count  New Count  Change  
------------------------------------------------------------
CVE-2006-6525   1          0          -1++++++
CVE-2010-4145   1          0          -1++++++
CVE-2008-6259   1          0          -1++++++
CVE-2008-0480   1          0          -1++++++
CVE-2007-1570   1          0          -1++++++
CVE-2015-5453   1          0          -1++++++
CVE-2015-8038   1          0          -1++++++
CVE-2017-12786  1          0          -1++++++
CVE-2013-4863   1          0          -1++++++
CVE-2023-5702   1          0          -1++++++


3. File-Level Change Analysis (CORRECTED METHODOLOGY)...
File-Level Changes Summary:
------------------------------------------------------------
Files REMOVED (with CVE): 0
Files REMOVED (without CVE): 0
Files ADDED (with CVE): 185
Files ADDED (without CVE): 106
CVE assignment changes (same files): 2,078

Net file change (with CVE): +185
Net file change (without CVE): +106
Total net file change: +291

Examples of REMOVED files (WITH CVE):
--------------------------------------------------------------------------------

Examples of ADDED files (WITH CVE):
--------------------------------------------------------------------------------
  + exploits/hardware/local/52242.txt (CVE: CVE-2021-33216) (local) by ub3rsick
  + exploits/hardware/local/52244.txt (CVE: CVE-2023-26602) (local) by ub3rsick
  + exploits/hardware/remote/52119.NA (CVE: CVE-2024-9054) (remote) by Armando Huesca Prida
  + exploits/hardware/remote/52120.NA (CVE: CVE-2024-43687) (remote) by Armando Huesca Prida
  + exploits/hardware/remote/52122.NA (CVE: CVE-2024-7801) (remote) by Armando Huesca Prida

Examples of CVE Assignment Changes (Same Files):
--------------------------------------------------------------------------------
File                           Old CVE         New CVE        
--------------------------------------------------------------------------------
exploits/aix/local/1046.c      CVE-2005-2236   CVE-2005-2236, CVE-2005-2232
exploits/aix/local/33725.txt   CVE-2014-3977   CVE-2014-3977, CVE-2012-2179
exploits/aix/webapps/21319.txt CVE-2012-2996   CVE-2012-2996, CVE-2012-2995
exploits/android/dos/44326.py  CVE-2017-13262  CVE-2017-13262, CVE-2017-13261, CVE-2017-13260, CVE-2017-13258
exploits/android/dos/44327.py  CVE-2017-13262  CVE-2017-13262, CVE-2017-13261, CVE-2017-13260, CVE-2017-13258


4. Comprehensive Database Statistics...
Database Statistics Comparison:
==========================================================================================
Version            Total      With CVE     Without CVE    Unique CVEs 
                   Exploits                                           
==========================================================================================
Old Database       46,494     26,907       19,587         22,130      
Current Database   46,785     27,093       19,692         22,455      
Net Change         291+++++++ 186+++++++++ 105+++++++++++ 325+++++++++

Percentage with CVE (Old): 57.9%
Percentage with CVE (Current): 57.9%


5. Creating Comprehensive Visualization...

png

6. Key Insights Summary:
================================================================================
📊 CVE-EXPLOIT DISTRIBUTION (Primary Focus):
   • CVEs with exploits (Old): 22,130
   • CVEs with exploits (Current): 22,455
   • Net change in CVEs with exploits: +325

📁 FILE-LEVEL CHANGES:
   • Files removed (with CVE): 0
   • Files added (with CVE): 185
   • Files removed (without CVE): 0
   • Files added (without CVE): 106
   • CVE assignment changes: 2,078

🔄 DATA QUALITY IMPROVEMENTS:
   • CVE assignment rate (Old): 57.9%
   • CVE assignment rate (Current): 57.9%
   • Change in CVE assignment quality: +0.0%

💡 INTERPRETATION:
   • Major focus on data quality: CVE assignment corrections exceed new additions
   • Growing CVE coverage: Net increase in unique CVEs with exploit evidence