End of Studies Projects 2025

Our hand-picked team consists of passionate, hard-working, collaborative, and forward-thinking

individuals with one common interest in mind: a love for data and retail.

359730844_778921304235464_1386841277491082746_n

Our available projects by department

Project Overview:

Create a generic microservice that, given an expression (similar to what we currently use in our Advanced Search), generates the necessary SQL queries to fetch data from Postgres.

Requirements:

  • Build a validation layer for the expressions.
  • Integrate the microservice with real-time query performance tracking.

Stretch Goals: Allow Postgres fuzzy matching & aggregations on data.

Technologies: Scala / Postgres / Docker / Kubernetes

Project Overview:

Most NoSQL databases do not support a rollback mechanism because of the nature of the DB. The task here is to implement a library that makes it easy to use in any codebase, allowing data to be recovered if something goes wrong programmatically. Ideally, it should work with any NoSQL DB, and data recovery can be handled asynchronously.

Requirements:

  • The candidate should consider an adequate design pattern to make the library easy to design and integrate into the codebase.
  • Log any errors that occur.
  • Test the library within one of our microservices.

Technologies: Scala / Cassandra / Mongo / Redis / Kafka / RabbitMQ / Kubernetes

Project Overview:

The Frontend Performance Monitoring Dashboard is a tool for tracking and visualizing key performance metrics across frontend applications. It collects data on page load times, resource usage, rendering speed, and user interactions to help the frontend team understand how their apps perform in real-world conditions. The tool includes a backend for storing historical data and generating performance reports, allowing the team to analyze and optimize over time.

Technologies Frontend:

React, Chart.js, D3.js, Material UI. Backend: Node or Scala; MongoDB, PostgreSQL. Data Collection: Performance monitoring libraries like Web Vitals, and possibly a third-party service for deeper analytics, such as Sentry or Google Analytics for comparison.

Project Overview:

This project focuses on helping retailers optimize their supply chain by predicting stock levels and forecasting product demand. The platform will allow retailers to track product movement, forecast future demand based on historical sales data, and suggest replenishment orders with suppliers to avoid stockouts and overstocking. By using machine learning or basic statistical methods to analyze sales patterns and other external factors (seasonality, promotions, etc.), the system can provide more accurate inventory management recommendations. This system will have multiple pages to manage inventory, forecast demand, track suppliers, and generate reports.

Technologies:

  • Front-End: React.js, Redux (for state management), Tailwind CSS or Material-UI (for UI components)
  • Back-End: Node.js, Express.js.
  • Database: PostgreSQL (for structured relational data on products, inventory, and suppliers)
  • Forecasting: Python (for machine learning or statistical algorithms such as ARIMA, scikit-learn for regression analysis)
  • Data Visualization: Chart.js or D3.js (for trend visualization and interactive graphs)

Project Overview:

Designing a tool that enables software engineers to visualize and analyze dependencies across application components, facilitating a clearer understanding of the interactions among different parts of the application and providing insights into areas of tight code coupling.

Goals: 

  • Helping the team to identify areas for refactoring, modularization or optimization to reduce tight coupling, making the system easier to maintain.
  • Helping the team to identify areas of the codebase that need more attention during code reviews, ensuring that high-dependency areas are properly tested.
  • Make it easier for new team members to get a quick overview of the project structure, speeding up the onboarding process.

Key features (Non-exclusive):

  • Dependency graph visualization
  • Circular dependency highlighting
  • Interactive graph navigation (Zoom in/out, collapse or expand sections of the graph for a detailed view of dependencies…)
  • Search and filter capabilities (to quickly locate specific components and explore their dependencies based on various criteria)
  • Impact analysis of changing a particular node
  • Refactoring recommendations
  • Seamless navigation between different projects, branches, or commits

Technologies:

  • Frontend: React.js, Material-UI
  • Data visualization libraries: 3D.js, Vis.js…
  • Backend: Node.js, Express.js
  • Database: NoSQL database (neo4j, MongoDB…)

Project Overview:

The goal of this project is to develop a user interface (UI) application that allows users to construct, configure, and preview settings for the main application. The UI app will enable users to adjust configurations, preview their changes in real time, and save the configurations to be dynamically served to the main app. These configurations will be handled through a Node.js backend, and NGINX will be used as a reverse proxy to forward configuration requests from the UI to the Node.js server.

 

  • UI Configuration Editor: Users can create and modify configurations through an intuitive UI.
  • Preview Changes: Real-time preview of how each configuration change will impact the UI or behavior of the app. Changes should be visible immediately without needing to save.
  • Validation: Ensure that the user inputs are valid (e.g., required fields, format checks, range checks) and provide feedback if something is wrong.
  • Save Configurations: After users have finalized their configuration, they can save it in a structured format that can be used by the main app and later on serve it for the webserver.
  • Versioning: Each configuration saved should have a version number, allowing users to track changes over time and manage multiple versions.

Technologies:

  • Front-End: React.js, Redux, Tailwind CSS or Material-UI
  • Back-End: Node.js, Express.js.
  • Database: PostgreSQL
  • WebServer: NGINX

Project Overview:

Making machine learning models more interpretable and trustworthy for demand forecasting by applying explainable AI techniques. Use methods like SHAP (Shapley Additive Explanations) or LIME (Local Interpretable Model-agnostic Explanations) to explain predictions made by simple and complex models (e.g., linear regression vs deep learning) to non-technical stakeholders.

Project Overview:

Past forecast assessments revealed cases where products appear suited for grouping under a new level, termed ‘super_SKUs.’ The current grouping method relies on time series patterns, but incorporating semantic understanding could improve accuracy and add valuable business insight.
This internship will explore adding semantic similarity metrics directly to the SKU detection process or as a post-processing validation step for ‘super_SKUs.’ This approach will leverage a language model, such as product embeddings from transactions, or involve fine-tuning an existing LLM architecture.

Project Overview:

In many data workflows, certain data is ingested sporadically, with new files or updates appearing irregularly rather than on a predictable schedule. Setting up a daily processing pipeline for such data can be costly and inefficient, as it often sits idle. An event-driven pipeline addresses this by triggering data processing only when new data arrives, ensuring resources are used effectively and data processing is initiated exactly when needed.

Goal: Create an event-driven pipeline to process data based on events, mainly file uploads to the data lake.

Steps:

  • Event Listening: Set up an event-driven architecture using a message broker like Kafka or a cloud webhook like Prefect API.
  • Event Configuration: Configure event sources to trigger events for file drops to the Azure Data Lake (ADLS).
  • Data Processing: Load the data from the ADLS and process it using Spark.
  • Data Validation: Implement data quality validation checks and generate reports to flag inconsistent data.
  • Data Load: Load the processed data to PostgreSQL.
  • Pipeline Automation: Automate the ETL pipeline using an Airflow DAG.
  • DAG Triggering: Configure a service to trigger the ETL pipeline in Airflow with every new detected event.

    Tech Stack:

    Scala / Python / Spark / Prefect or Kafka / Airflow / Docker / Kubernetes / Azure Data Lake / PostgreSQL

Project Overview:

CRSP, short for Cognira’s Retail Science Platform, is an internal platform that provides the tools and infrastructure needed to manage big data and run complex transformation pipelines.

Goal: Build a metadata management system for the datasets ingested and processed by CRSP to improve data lineage and discoverability.

Steps:

  • Data Ingestion: Use Spark to load the CRSP datasets metadata (creation date, format, schema, etc…) and store it in PostgreSQL.
  • Metadata Catalog: Implement a metadata store using OpenMetadata to centralize all the information and enable users to search for datasets by schema or date, etc…
  • Data Lineage: Store the schema versions to track the schema evolution over time. Keep track of each dataset’s transformations and update the metadata with every CRSP transformation that’s created or deleted.
  • Data Validation: Add data quality validation checks and flag missing or inconsistent metadata. Generate periodic reports on CRSP usage, dataset growth trends, and metadata quality.
  • Pipeline Automation: Automate the workflow using an Airflow DAG.

Tech stack : Scala Spark Azure Data Lake OpenMetadata PostgreSQL Docker Kubernetes Airflow

Project Overview:

As Spark jobs scale on Kubernetes, performance bottlenecks can lead to inefficient resource usage, high costs, and delayed processing times. Identifying and resolving these bottlenecks often requires deep expertise in Spark configurations and code optimizations. A tool that leverages a language model (LLM) to analyze job performance, pinpoint bottlenecks, and suggest targeted code or configuration adjustments could empower engineers to optimize their Spark workloads more effectively, reducing latency and improving resource efficiency.

Goal: Develop a tool that helps identify bottlenecks in Spark jobs running on Kubernetes and suggests code and configuration improvements using an LLM.

Steps:

  • Metrics Collection: Deploy Prometheus to scrape Spark performance metrics from jobs running on Kubernetes (e.g., run time, CPU usage, memory usage..). Write a Scala client to retrieve these metrics from Prometheus API and extract the physical plan from the Spark UI API.
  • Performance Analysis and Evaluation: Send the collected data and physical plan summary to the LLM with prompts to detect faulty patterns, identify potential bottlenecks, and expensive operations.
  • Generating Recommendations: Write code that generates tuning suggestions based on the collected data with explanation, e.g. recommend caching datasets that are used multiple times in the transformation, or suggest broadcast when the size of the data is lower than a certain threshold.
  • Dashboard: Build an interactive UI that allows users to monitor the lifecycle of every Spark job, visualize performance metrics and optimization recommendations, and receive LLM-powered explanations.
  • LLM Feedback Loop: Store the historical job data and patterns in order to fine-tune the LLM periodically and improve its recommendations over time.

Tech stack : Scala Spark Kubernetes Prometheus Llama React

Project Overview:

Build an API-driven tool that allows users to request custom metrics, triggering a Spark job to calculate these metrics and return the results in real time or near-real time, depending on processing complexity. This tool will be capable of handling both complex and simple metric computations, making it a versatile addition to data analytics for retailers.

Goal: Develop a tool that helps identify bottlenecks in Spark jobs running on Kubernetes and suggests code and configuration improvements using an LLM.

Steps:

  • Develop a RESTful API that serves as the interface for users to request specific metrics. Each request will contain a payload specifying the metrics to calculate.
  • Implement an event-driven mechanism where each API request triggers a Spark job. Use a message broker to queue requests and allow Spark to process them sequentially or in parallel.
  • Create an Airflow DAG to orchestrate these Spark jobs. Each API request will trigger an Airflow task that runs the appropriate Spark job for the requested metrics.
  • Set up logic in the API to determine which specific Spark jobs to run based on the requested metrics, including any custom filters or aggregations

Spark Job Design for Metric Calculations

  • Reusable Metric Calculations: Develop modular Spark jobs that can calculate different metrics based on parameters received from the API payload.
  • Performance Optimization: Use techniques like partitioning, caching, and broadcast joins in Spark to optimize processing, especially for complex or high-volume requests.

Data Storage and Access

  • Source Data Access: Connect the Spark jobs to relevant data sources (PostgreSQL,Cassandra/DuckDB)
  • Use caching for intermediate and Historical Metrics Storage


Stretch goals : Error Handling, Validation, and Logging

  • Input Validation: Validate incoming API payloads to ensure they contain valid metric names and properly structured filter parameters.
  • Error Handling and Logging: Implement error handling to catch and log issues like missing data, invalid metric requests, or Spark job failures.
  • Alerting Mechanism: Set up alerts for failed or delayed jobs, with notifications sent to a monitoring system or slack.

Monitoring and Scaling

  • Metrics Dashboard: Use Prometheus and Grafana to monitor the API performance, Spark job completion rates, and error rates.
  • Auto-Scaling on Kubernetes: Deploy the API and Spark cluster on Kubernetes, using auto-scaling to dynamically allocate resources based on API load and job volume

Create a UI

 

Tech stack: Scala Spark Airflow Kubernetes Kafka Cassandra/DuckDB/Postgress Prometheus/Grafana

Project Overview:

The project involves designing and developing a solution to create a centralized repository
of operational metrics for monitoring your Lakehouse (data ingress).

Steps:

  • Build a Spark listener that triggers each time data is updated.
  • Collect various metadata from the Delta log.
  • Define KPI metrics to assess data quality and table status.
  • Develop an ETL pipeline based ib the Medallion Architecture to process the collected data and calculate metrics.
  • Create a dashboard to visualize these metrics for end users, using different graphs. Include a role-based recommendation system that suggests optimization techniques in certain cases (e.g., addressing small file issues, data skew, or optimizing reorganization).

Tech stack: Scala Spark Delta_Lake Airflow Docker K8s Azure Dashboarding tool(…)

Project Overview:

This project focuses on enhancing client support and delivery operations by leveraging data analytics, strategic management frameworks, and process optimization techniques. The objective is to analyze the current support processes, identify inefficiencies, and redesign workflows to improve performance, client satisfaction, and resource utilization.

The project starts with a comprehensive analysis of the current workflows in client engagement and delivery, using historical data from 2023-2024. Advanced business analytics techniques will be employed to extract insights from this data, such as identifying bottlenecks, and recurring client issues that delay resolution times.

 

Based on this analysis, the project will propose a re-engineered process that eliminates inefficiencies and better aligns with business goals and client needs. A capacity planning model will also be developed to build a customer support process allowing the company to allocate resources more effectively across departments using predefined KPIs. 

Customer journey mapping will be used to further enhance the client experience, ensuring that all touchpoints—from onboarding to ongoing support—are seamless and efficient. This will result in a client-centric support system that boosts satisfaction and retention.

 

The success of the new processes will be measured by tracking key performance indicators (KPIs) like response times, resolution rates, customer satisfaction scores, and alignment with service level agreements (SLAs). The final outcome will be a strategic roadmap that includes recommendations for further digital transformation and continuous improvement in support and delivery functions.

Skills and Tools Needed:

  • Data Analytics and Business Intelligence: Data analysis, statistical modeling, and data visualization. / Tools: Power BI, Tableau, or Excel for creating dashboards and visualizing key metrics.
  • Strategic Thinking and Problem Solving: Strategic planning, problem-solving, and scenario analysis. / Techniques: SWOT analysis, customer journey mapping, capacity planning models.
  • Automation and Digital Tools: Familiarity with automation tools and customer service technologies. / Tools: AI-based ticket routing systems, JIRA, CRM tools, and customer engagement platforms for process automation.
  • Project Management: Time management, project planning, stakeholder management. / Tools: Microsoft Project or Asana to manage tasks, timelines, and deliverables.

Benefits for the Company:

  • Improved Efficiency: By redesigning workflows and incorporating automation, the company will reduce manual effort and eliminate bottlenecks. This will lead to faster resolution times, better resource utilization, and reduced operational costs.
  • Enhanced Client Satisfaction: The project will improve the customer journey by addressing pain points, ensuring timely responses, and enhancing the overall experience. Higher satisfaction will lead to improved client retention, stronger relationships, and potentially increased revenue.
  • Data-Driven Decision-Making: The project will introduce advanced analytics and a real-time dashboard that will allow management to monitor performance in real-time, and make informed decisions. This data-driven approach ensures that the company can quickly adapt to changing client needs and business conditions.
  • Strategic Capacity Planning: The capacity planning model will ensure that the company has the right resources in the right places at the right times. This will help the company avoid under- or over-staffing, reducing costs and improving team efficiency.

Overall, this project will provide the company with a strategic framework that improves both short-term operational performance and long-term client engagement and support processes. 

Project Overview:

Historically, Jenkins has been our main CI/CD tool, offering extensive flexibility for custom workflows. However, this flexibility comes the cost of having to maintain a large amount of plugins and dependencies. The aim of this project is to transition to a more modern CI/CD platform that has built-in automation features that will reduce the maintenance complexity. The second part of the project will focus on the integration of Terraform with GitLab to automate infrastructure management using Git as the main source of truth.

Goal: Create a PoC of a CI/CD system migration and infrastructure automation using Terraform and GitLab runners.


Steps:

  • Review current Jenkins pipelines and document key dependencies
    – Set up GitLab runners on Kubernetes
  • Replicate key Jenkins pipelines on GitLab while preserving key functionalities
  • Incorporate GitLab’s built-in features to optimize workflows and ensure scalability and performance
  • Integrate Terraform with GitLab to automate Azure infrastructure management
  • Document the GitLab workflows and conduct a review of the migration from Jenkins

Technologies: Jenkins / GitLab runner / Kubernetes / Argo CD / Terraform / Azure

Project Overview:

We aim to develop a fully free, AI-powered chatbot capable of searching and retrieving information from our extensive documentation stored on Google Drive. Using open-source tools and models, the chatbot will be able to answer user queries by locating relevant content across our files, summarizing the information in a clear response.
The solution will leverage free models like Meta’s Llama 2 or Mistral 7B, alongside LangChain for easy integration and document management and open-source vector database and search tools like FAISS for efficient retrieval.
The project offers the opportunity to build a sophisticated document-querying chatbot using cutting-edge, cost-effective AI technologies and to explore data organization and retrieval mechanisms within a real-world business context.

Project Overview:

Effective inventory management is crucial to meet customer demand while minimizing costs.
Forecasting plays a pivotal role in this process by predicting future demand, enabling businesses to optimize their inventory levels and replenishment strategies.
Accurate forecasts provide the foundation for setting key Inventory Replenishment Parameters. These parameters are what determines the optimal inventory levels ensuring that stock levels are maintained efficiently without overstocking or stockouts.
This project aims to assess how the forecast perfomance is impacting the inventory optimzation by analyzing multiple forecast scenerios issues and model their impact. And eventually the goal is to determine the right forecast adjustements to improve inventory optimization results by computing ROI metrics and simulate the impact based on different forecasts feed.
Through this analysis, we will explore the balance between demand forecasting performance and inventory optimization results in order to come up with the right forecast adjustments which will lead to reaching the right inventory levels and replenishment strategies.

Project Overview:

The project aims to enhance sensitivity analysis for retail shrinkage by addressing the lack of data in terms of quantity, diversity, and unseen but foreseeable scenarios. Retail sensitivity models often face challenges such as overfitting and unreliable predictions due to limited or heterogenous datasets. To overcome these challenges, the project will leverage generative AI for the following purposes:

  • The first objective is to use synthetic data generation to create diverse and realistic datasets. This will introduce plausible variations of key features such as crime rates, customer demographics, new stores, allowing the model to be trained on a broader range of conditions. By doing so, the project will ensure that the model can make more robust predictions, even in the face of limited or biased historical data.
  • The second objective is to apply counterfactual analysis to simulate plausible “what-if” scenarios, such as shifts in consumer behavior, economic changes, or future crime patterns that have not yet occurred but are foreseeable. By generating and exploring these future-like conditions, the project will enable retailers to anticipate how shrinkage might evolve under different circumstances.

Can't Find Any Projects in your field ?