Data preparation for RAG
Web scraping tool — Crawl4AI , youtube video
www.example.com/sitemap.xml e.g. https://www.cisco.com/web/sitemap/www_cisco_com_en_us_index.xml
www.example.com/robots.txt (view allow/disallow list). e.g https://www.cisco.com/robots.txt
other docs incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, …) Docling
Leave a Reply