Our hand-picked team consists of passionate, hard-working, collaborative, and forward-thinking
individuals with one common interest in mind: a love for data and retail.
Project Overview:
Create a generic microservice that, given an expression (similar to what we currently use in our Advanced Search), generates the necessary SQL queries to fetch data from Postgres.
Requirements:
Stretch Goals: Allow Postgres fuzzy matching & aggregations on data.
Technologies: Scala / Postgres / Docker / Kubernetes
Project Overview:
Most NoSQL databases do not support a rollback mechanism because of the nature of the DB. The task here is to implement a library that makes it easy to use in any codebase, allowing data to be recovered if something goes wrong programmatically. Ideally, it should work with any NoSQL DB, and data recovery can be handled asynchronously.
Requirements:
Technologies: Scala / Cassandra / Mongo / Redis / Kafka / RabbitMQ / Kubernetes
Project Overview:
The Frontend Performance Monitoring Dashboard is a tool for tracking and visualizing key performance metrics across frontend applications. It collects data on page load times, resource usage, rendering speed, and user interactions to help the frontend team understand how their apps perform in real-world conditions. The tool includes a backend for storing historical data and generating performance reports, allowing the team to analyze and optimize over time.
Technologies Frontend:
React, Chart.js, D3.js, Material UI. Backend: Node or Scala; MongoDB, PostgreSQL. Data Collection: Performance monitoring libraries like Web Vitals, and possibly a third-party service for deeper analytics, such as Sentry or Google Analytics for comparison.
Project Overview:
This project focuses on helping retailers optimize their supply chain by predicting stock levels and forecasting product demand. The platform will allow retailers to track product movement, forecast future demand based on historical sales data, and suggest replenishment orders with suppliers to avoid stockouts and overstocking. By using machine learning or basic statistical methods to analyze sales patterns and other external factors (seasonality, promotions, etc.), the system can provide more accurate inventory management recommendations. This system will have multiple pages to manage inventory, forecast demand, track suppliers, and generate reports.
Technologies:
Project Overview:
Designing a tool that enables software engineers to visualize and analyze dependencies across application components, facilitating a clearer understanding of the interactions among different parts of the application and providing insights into areas of tight code coupling.
Goals:
Key features (Non-exclusive):
Technologies:
Project Overview:
The goal of this project is to develop a user interface (UI) application that allows users to construct, configure, and preview settings for the main application. The UI app will enable users to adjust configurations, preview their changes in real time, and save the configurations to be dynamically served to the main app. These configurations will be handled through a Node.js backend, and NGINX will be used as a reverse proxy to forward configuration requests from the UI to the Node.js server.
Technologies:
Project Overview:
Making machine learning models more interpretable and trustworthy for demand forecasting by applying explainable AI techniques. Use methods like SHAP (Shapley Additive Explanations) or LIME (Local Interpretable Model-agnostic Explanations) to explain predictions made by simple and complex models (e.g., linear regression vs deep learning) to non-technical stakeholders.
Project Overview:
Past forecast assessments revealed cases where products appear suited for grouping under a new level, termed ‘super_SKUs.’ The current grouping method relies on time series patterns, but incorporating semantic understanding could improve accuracy and add valuable business insight.
This internship will explore adding semantic similarity metrics directly to the SKU detection process or as a post-processing validation step for ‘super_SKUs.’ This approach will leverage a language model, such as product embeddings from transactions, or involve fine-tuning an existing LLM architecture.
Project Overview:
In many data workflows, certain data is ingested sporadically, with new files or updates appearing irregularly rather than on a predictable schedule. Setting up a daily processing pipeline for such data can be costly and inefficient, as it often sits idle. An event-driven pipeline addresses this by triggering data processing only when new data arrives, ensuring resources are used effectively and data processing is initiated exactly when needed.
Goal: Create an event-driven pipeline to process data based on events, mainly file uploads to the data lake.
Steps:
Tech Stack:
Scala / Python / Spark / Prefect or Kafka / Airflow / Docker / Kubernetes / Azure Data Lake / PostgreSQL
Project Overview:
CRSP, short for Cognira’s Retail Science Platform, is an internal platform that provides the tools and infrastructure needed to manage big data and run complex transformation pipelines.
Goal: Build a metadata management system for the datasets ingested and processed by CRSP to improve data lineage and discoverability.
Steps:
Tech stack : Scala Spark Azure Data Lake OpenMetadata PostgreSQL Docker Kubernetes Airflow
Project Overview:
As Spark jobs scale on Kubernetes, performance bottlenecks can lead to inefficient resource usage, high costs, and delayed processing times. Identifying and resolving these bottlenecks often requires deep expertise in Spark configurations and code optimizations. A tool that leverages a language model (LLM) to analyze job performance, pinpoint bottlenecks, and suggest targeted code or configuration adjustments could empower engineers to optimize their Spark workloads more effectively, reducing latency and improving resource efficiency.
Goal: Develop a tool that helps identify bottlenecks in Spark jobs running on Kubernetes and suggests code and configuration improvements using an LLM.
Steps:
Tech stack : Scala Spark Kubernetes Prometheus Llama React
Project Overview:
Build an API-driven tool that allows users to request custom metrics, triggering a Spark job to calculate these metrics and return the results in real time or near-real time, depending on processing complexity. This tool will be capable of handling both complex and simple metric computations, making it a versatile addition to data analytics for retailers.
Goal: Develop a tool that helps identify bottlenecks in Spark jobs running on Kubernetes and suggests code and configuration improvements using an LLM.
Steps:
Spark Job Design for Metric Calculations
Data Storage and Access
Stretch goals : Error Handling, Validation, and Logging
Monitoring and Scaling
Create a UI
Tech stack: Scala Spark Airflow Kubernetes Kafka Cassandra/DuckDB/Postgress Prometheus/Grafana
Project Overview:
The project involves designing and developing a solution to create a centralized repository
of operational metrics for monitoring your Lakehouse (data ingress).
Steps:
Tech stack: Scala Spark Delta_Lake Airflow Docker K8s Azure Dashboarding tool(…)
Project Overview:
This project focuses on enhancing client support and delivery operations by leveraging data analytics, strategic management frameworks, and process optimization techniques. The objective is to analyze the current support processes, identify inefficiencies, and redesign workflows to improve performance, client satisfaction, and resource utilization.
The project starts with a comprehensive analysis of the current workflows in client engagement and delivery, using historical data from 2023-2024. Advanced business analytics techniques will be employed to extract insights from this data, such as identifying bottlenecks, and recurring client issues that delay resolution times.
Based on this analysis, the project will propose a re-engineered process that eliminates inefficiencies and better aligns with business goals and client needs. A capacity planning model will also be developed to build a customer support process allowing the company to allocate resources more effectively across departments using predefined KPIs.
Customer journey mapping will be used to further enhance the client experience, ensuring that all touchpoints—from onboarding to ongoing support—are seamless and efficient. This will result in a client-centric support system that boosts satisfaction and retention.
The success of the new processes will be measured by tracking key performance indicators (KPIs) like response times, resolution rates, customer satisfaction scores, and alignment with service level agreements (SLAs). The final outcome will be a strategic roadmap that includes recommendations for further digital transformation and continuous improvement in support and delivery functions.
Skills and Tools Needed:
Benefits for the Company:
Overall, this project will provide the company with a strategic framework that improves both short-term operational performance and long-term client engagement and support processes.
Project Overview:
Historically, Jenkins has been our main CI/CD tool, offering extensive flexibility for custom workflows. However, this flexibility comes the cost of having to maintain a large amount of plugins and dependencies. The aim of this project is to transition to a more modern CI/CD platform that has built-in automation features that will reduce the maintenance complexity. The second part of the project will focus on the integration of Terraform with GitLab to automate infrastructure management using Git as the main source of truth.
Goal: Create a PoC of a CI/CD system migration and infrastructure automation using Terraform and GitLab runners.
Steps:
Technologies: Jenkins / GitLab runner / Kubernetes / Argo CD / Terraform / Azure
Project Overview:
We aim to develop a fully free, AI-powered chatbot capable of searching and retrieving information from our extensive documentation stored on Google Drive. Using open-source tools and models, the chatbot will be able to answer user queries by locating relevant content across our files, summarizing the information in a clear response.
The solution will leverage free models like Meta’s Llama 2 or Mistral 7B, alongside LangChain for easy integration and document management and open-source vector database and search tools like FAISS for efficient retrieval.
The project offers the opportunity to build a sophisticated document-querying chatbot using cutting-edge, cost-effective AI technologies and to explore data organization and retrieval mechanisms within a real-world business context.
Project Overview:
Effective inventory management is crucial to meet customer demand while minimizing costs.
Forecasting plays a pivotal role in this process by predicting future demand, enabling businesses to optimize their inventory levels and replenishment strategies.
Accurate forecasts provide the foundation for setting key Inventory Replenishment Parameters. These parameters are what determines the optimal inventory levels ensuring that stock levels are maintained efficiently without overstocking or stockouts.
This project aims to assess how the forecast perfomance is impacting the inventory optimzation by analyzing multiple forecast scenerios issues and model their impact. And eventually the goal is to determine the right forecast adjustements to improve inventory optimization results by computing ROI metrics and simulate the impact based on different forecasts feed.
Through this analysis, we will explore the balance between demand forecasting performance and inventory optimization results in order to come up with the right forecast adjustments which will lead to reaching the right inventory levels and replenishment strategies.
Project Overview:
The project aims to enhance sensitivity analysis for retail shrinkage by addressing the lack of data in terms of quantity, diversity, and unseen but foreseeable scenarios. Retail sensitivity models often face challenges such as overfitting and unreliable predictions due to limited or heterogenous datasets. To overcome these challenges, the project will leverage generative AI for the following purposes:
Founded by experienced data scientists and retail experts, Cognira is the leading artificial intelligence solutions provider.