This idea is based on this GitHub repo: https://github.com/joaomdmoura/crewAI-examples/tree/main/stock_analysis
Here's an overview of what this repo does and how it works: this script is made using the Crew AI framework, and its main work is to research and recommend whether to buy a given stock name.
BUT, I want to modify some things in this script and its workflow, particularly how user input is handled. I want to automate the input process.
**Main Idea and Goal:** Whenever a recent SEC filing is added to the SEC database, it will be scraped (note: only scrape filings added within the past hour timeframe). If possible, I'd like to filter by form type (like 10-K) and ignore other files.
NOTE: All files should be unique, meaning we shouldn't send a previously processed file to our main LLM model. To prevent this, we'll use a database. If scraped SEC filing data content matches an entry in the database, it won't be sent to the next processing stage. If the SEC filing data is unique, then it will be saved to the database and proceed.
Here's the last filter stage: If the SEC filing company CIK number matches one of the CIK numbers on a list (containing around 2000 CIK numbers), then it will pass to the LLM processing stage.
After all filters are applied, the script will download the SEC filing data from the given link based on its format (TXT or XML).
**Crew AI Framework Integration:**
There will be three agents:
* **financial_analyst:** Same as in the example repo.
* **SEC_filing_agent:** To analyze SEC filing data.
* **recommendation_agent:** Same as in the example repo.
**Workflow:**
1. If the SEC filing successfully clears all filter stages, the company name and symbol will be passed to the **financial_analyst** agent. Include the company name as a variable within the agent's description prompt.
2. Send the SEC company filing data to the **SEC_filing_agent**.
3. Results from both agents will be assigned and sent to the **recommendation_agent**, which will decide to take a long or short position on the stock (as in the example repo).
I hope you understand my main goal with this framework and script. If you have suggestions for a less complex or more valuable output, please share them!
**Main Limitations/Problems:**
1. **High LLM API Costing:** Using something like GPT-4 for all processing on a single SEC filing is expensive (around $0.90 depending on form type, primarily due to the average ~55,000 words in 10-K forms).
2. **Removing Unnecessary Words:** Raw SEC filing data (especially in TXT format) contains extraneous markup:
NOTE I AM STILL WORKING ON THIS IDEA AND TESTING LLM OUTPUT IS SUITABLE FOR IT OR NOT,
AND ALSO TESTING COST EFFECTIVE SOLTION FOR THIS LLM BY INTERGATING RAG TO SYSTEM.
IF YOU ARE INTERESTED MORE ABOUT IT THAN HERE YOUTUBE LINK https://www.youtube.com/watch?v=OYg1-90n0Ro&t=607s BY WATCHING THIS VIDEO WILL UNDERSTAND MORE ABOUT IT, HOW THIS WORK AND GIVE ADVANTAGE TO BUILD