Open-sourcing SEC EDGAR on Hugging Face
r/LocalLLaMA
•
Open Source AI
AI Research
AI Tools
S. AI ecosystem, it is now important than ever to push for the proliferation of open model and dataset releases. [Datamule], [Teraflop AI], and [Eventual] collaborated to release the [SEC-EDGAR dataset]. The dataset contains 590 GB of data, spanning 8M samples and 43B tokens from all major filings in the SEC EDGAR database. Many different unofficial API providers charge hundreds of dollars a month to access this data with strict limits.