Context
The client generated a large amount of corporate documentation in many different formats and locations. This dispersion made it slow and inefficient to find information for tasks such as drafting newsletters, reports, or answering internal queries. Traditional document management tools did not provide a unified access experience nor did they take advantage of artificial intelligence capabilities.
Objectives
- Create an intelligent assistant capable of answering questions about the company’s documentation.
- Allow direct access from Microsoft Teams, the users’ daily work environment.
- Reduce time and errors in searching and reusing information.
- Ensure scalability and security in data management.
- Establish a technological foundation for future corporate AI projects.
Requirements
- Integration with the existing Azure Data Lake, using it as the single repository.
- Ability to index documents in multiple formats (Word, PDF, etc.).
- Advanced search engine with vector translation and hybrid search (semantic and keyword-based).
- Compliance with Microsoft Azure security and permission standards.
- Modular and open architecture to support evolution and new functionalities.
Implementation
An architecture based on the Retrieval-Augmented Generation (RAG) paradigm was deployed:
- Azure Data Lake as the central document repository.
- Azure Cognitive Search with embeddings to build semantic indexes.
- Azure OpenAI with GPT models to understand and generate responses.
- App Service + Azure Bot Service to provide the chatbot integrated into Teams.
- CosmosDB and Key Vault for state management and security.
The assistant breaks each document into chunks, indexes them, and relates them semantically. When a user submits a query, the system divides the question into sub-queries and retrieves the most relevant information before generating the final response.
My contribution
- Definition of the architecture and functional and security requirements.
- Selection of the most suitable technologies and coordination with providers.
- Drafting of the project’s formal documentation.
- Deployment and validation with real usage tests.
Conclusions
The project enabled the launch of an intelligent document assistant accessible from Microsoft Teams, capable of retrieving precise information across thousands of documents. This solution improved team productivity and laid the foundation for other AI projects within the organization.
Possible improvements
In the short term, indexes could be enriched with thematic classification and key phrase extraction. In the medium term, automation mechanisms for document preparation could be incorporated and the assistant expanded to new channels beyond Teams.