Kaggle Datasets (2025): The go-to for data scientists seeking huge datasets. Thousands of datasets span themes from AI to finance to health, and they are free!
Google Dataset Search: This is basically Google for datasets! Access millions of open databases with a few easy searches, for free!
UCI Machine Learning Repository: A venerable old-timer in the AI space. UCI has given researchers access to thousands of datasets, consistently with clean datasets for your machine learning projects.
GitHub Open Datasets: GitHub is a favorite of developers, and now they allow you to share and collaborate with others through their open datasets section.
AWS Open Data Registry: Amazon's open data registry has huge datasets available (climate and weather data, genomics, satellite images, etc.).
Microsoft Research Open Data: High-quality datasets from Microsoft research projects that include, but aren't limited to, natural language processing, computer vision, etc.
Hugging Face Datasets Hub: Most relevant for AI developers is that it offers pre-packaged NLP, vision, and audio datasets to get started with an API easily.
Data.gov (USA): More than 250,000 datasets are freely available. Topics include healthcare, education, energy, etc.