BACK

Problem Statement:
- Manual CV review process causing inefficiency and risking business deals due to outdated or poor-quality CVs.
- Current manual validation of about 1,000 employee CVs would require one full-time person.
- Main challenges identified:
1. Manual review process time-consuming.
2. Lack of common understanding and guidelines for CV content quality.
3. Need for instant feedback on CV quality to avoid last-minute issues.
4. Need for reporting and metrics on CV quality trends.

Quality Metrics for CVs:
- Completeness: all required fields filled (currently manageable).
- Consistency: no contradictory info (managed via dictionaries).
- Validity: correct formats and timestamps (handled by validators).
- Uniqueness: no duplicate data.
- **Accuracy: the main missing metric—precision and quality of content descriptions in CVs.**

AI-Driven Solution Approach:
- AI processes CV documents (PDF, DOC) to answer specific true/false questions related to CV quality (e.g., presence of technical skills, formal writing).
- Responses form a binary array that quantifies CV quality by a score.
- Enables automatic and objective CV quality assessment to replace manual reviews.
- Provides instant feedback regarding CV adequacy.

Technology Stack and Architecture:
- Preference for Java-based integration due to internal developer skillset.
- Two options for Java + AI: Spring AI (simpler, for integration into existing apps) and Langchain4j (more flexible for complex AI apps).
- Architecture overview:
- REST API receives CVs, stores locally.
- Documents sliced and processed by embedding models to create vector representations.
- Vector store (initially Elasticsearch, moving to Quadrant) holds embeddings.
- LLM (Large Language Model) analyzes embeddings to answer quality questions.
- AI models hosted on OVHcloud (a European, regulation-compliant provider) with GPU support.
- Ollama tool manages LLM deployment locally or cloud.
- Recommended model: GEMA 3 (performs well on single GPU, cost-effective).
- Performance example: Processing 1,000 CVs takes ~2 hours at approx. 7 PLN cost on Tesla V100.
- Trade-offs between larger and smaller models considered to balance cost, speed, and quality.

Demo Highlights:
- System loading multiple CVs, embedding, and scoring them.
- Scores range from 0% (empty CV) to 100% quality.
- Enables tracking quality scores over time for teams/projects.

Benefits and Future Use:
- Enables building BI dashboards to monitor CV quality per employee/project.
- Allows sending reminders to employees/managers to update CVs.
- Cost-effective and scalable solution, particularly with growing data volumes.
- Encourages use of local AI models for data privacy and compliance reasons.
- Continuous improvement of models expected.

Actionable Items / Tasks:
- Explore integration of AI-powered CV quality scoring within HR systems.
- Define question sets and scoring metrics for CV evaluation.
- Choose appropriate AI framework (Spring AI or Langchain4j) based on project complexity.
- Deploy AI models (GEMA 3 recommended) on compliant cloud infrastructure or locally via Ollama.
- Implement vector database to store embedding vectors — consider Quadrant or Elasticsearch.
- Develop BI dashboards for monitoring CV quality metrics and trends.
- Set up alert/reminder mechanisms to prompt employees for CV updates.
- Assess costs and performance trade-offs for AI model choices.
- Ensure compliance with GDPR and data security best practices during data handling.
- Engage with Soprasteria’s team for expertise and collaboration opportunities on AI solutions.

Empowering HR with AI: Data Processing with LangChain4j

Share:

14:00 - 14:30, 27th of May (Tuesday) 2025 / DEV AI & DATA STAGE

Employee competency data is critical for HR departments—it enables effective human resource management, skills development, and the achievement of strategic organizational goals. However, a key question arises: are the data we rely on truly up-to-date and of sufficient quality to support sound decision-making?

Can artificial intelligence help us address this challenge?

In this session, I will present a real business case where the implementation of AI supported HR data quality analysis. I will discuss the key obstacles that hindered implementation and played a significant role in selecting the appropriate system architecture.

I will also cover the technical aspects of using AI for data analysis and explain why I chose LangChain4j Java framework for this solution.

To wrap up, I will outline the strengths and limitations of the approach.

LEVEL:
Basic Advanced Expert
TRACK:
AI/ML Data
TOPICS:
AI

Miłosz Niczyporuk

Sopra Steria