Science Data Service is an automated content aggregation platform focused on science news. It scrapes articles from multiple science websites, translates and summarizes them using OpenAI GPT API, and provides a RESTful API for accessing the aggregated content.
The system is designed with Python technologies including FastAPI, Celery for asynchronous task processing, and Scrapy for web scraping. Data is stored in MongoDB, and Redis is used for message brokering between services.
Highlights
- Automated scraping: Collects articles from various science news sources.
- AI-based summarization and translation: Utilizes OpenAI GPT API for generating readable summaries and translations.
- FastAPI backend: Serves a RESTful API with interactive documentation.
- Task queue system: Asynchronous processing of scraping and summarization tasks with Celery and Redis.
- MongoDB storage: Manages collected article data efficiently.
- Authentication system: Planned support for user accounts and commenting.
- Future development: Building a forum for science discussions.
This project reflects a complete architecture for scalable, AI-enhanced content aggregation with future social features aimed at fostering engagement around scientific topics.