Yongshin Kim

Machine Learning Engineer

Education

Data Science (M.S.)

KAIST | Korea Advanced Institute of Science and Technology

2020 - 2022

Mathematics and Statistics (B.A.)

HGU | Handong Global University

2014 - 2020

Languages

Korean (Native)
English (Professional)

Interests

Retrieval-Augmented Generation
Embedding Model
Large Language Model
Natural Language Processing

Updated April 1, 2026

Welcome to Yongshin's page

I completed an M.S. at the Graduate School of Data Science at KAIST, where I had the privilege of working in the Interactive Computing Lab under the mentorship of Prof. Uichin Lee. My master’s research centered on predicting human emotions through audio and physiological signals, with a particular focus on how data from conversation partners can enhance the accuracy of emotion predictions for speakers during naturalistic interactions.

Previously, I worked as a machine learning engineer at Okestro, where I specialized in Natural Language Processing (NLP). Our team was dedicated to automating corporate analysis in the field of business intelligence using NLP. We developed a service for real-time company evaluation in a cloud environment. We also explored generative models, including the use of Retrieval-Augmented Generation to create a domain-specific chatbot. In addition, we worked on a project to finetuned an open-source gpt model such as Llama, Orion to improve the RAG inference ability for GPT.

I am currently a machine learning engineer at KPMG, where I spearhead the development of a comprehensive AI platform that integrates cutting-edge technologies—including chatbots, advanced data visualization, and specialized data analysis for sectors like manufacturing. My work spans the creation of intuitive chatbot systems capable of handling diverse queries with context-aware precision, as well as visualization tools that interpret conversational inputs to generate relevant charts.

Outside of my professional life, I enjoy staying active through sports, particularly racket games like badminton and tennis. I also find relaxation in playing Go, a game that, despite its simple rules, requires deep strategic thinking. I am known for my consistency and commitment, always striving to approach tasks with responsibility and dedication.

Projects

KPMG | AI-Powered Proposal Finder - We developed an AI-powered service designed to address one of the most time-consuming challenges for consultants: searching for relevant past project proposals. Traditionally, identifying suitable reference proposals requires extensive manual effort and deep domain knowledge. Proposal Finder transforms this process by enabling natural language search, allowing users to describe their needs intuitively. Leveraging advanced query rewriting and tagging-based search technologies, the system refines user input into optimized search queries and matches them against a structured proposal database. This enables rapid discovery of highly relevant proposals, even when exact keywords are not provided. As a result, consultants can significantly reduce time spent on research, improve proposal quality through better references, and focus more on high-value strategic work rather than manual document retrieval.

KPMG | Kbank Loan & Deposit Automation System - In banking operations, reviewing documents for lending and deposit decisions is complex and time-consuming. Institutions must examine corporate bylaws, terms, and authorization documents to assess eligibility and collateral. We developed an AI-powered automation system that streamlines the entire review workflow for lending and deposit operations. By simply uploading key documents such as corporate bylaws, the system automatically identifies required review items, analyzes document content, and generates structured evaluation results. Leveraging advanced document understanding and LLM-based reasoning, it reduces manual effort, accelerates decision-making, and ensures consistent, scalable review quality.

KPMG | AI-Based Automated Material Analysis System - We developed an AI-powered system that automates the most labor-intensive aspects of material master registration: extracting specifications from vendor documents, standardizing attribute formats, and integrating directly into ERP. By leveraging AI-based text mining and a refined Rule Book for class-specific attribute rules, the solution automatically processes vendor prints, recommends standardized values, and detects duplicates before registration. This innovation reduced manual registration time from 10–15 minutes per material to under a minute—cutting annual workload by more than 700 hours and, for large-scale projects, over 3,300 hours. The system also prevents costly duplicate or “no-code” purchases, expected to save up to ₩1.3 billion annually, while delivering a unified, high-quality material master database.

KPMG | AI-Driven Smart Information Security Disclosure Automation Platform - We developed an AI-powered platform that automated the most time-consuming aspects of corporate information security disclosure: classifying IT and security expense ledgers and calculating costs. Leveraging AI-based keyword classification, the system automatically identified security-related entries in accounting data and extracted the information needed for headcount cost estimation, cutting preparation time by up to 90%. What once took large enterprises weeks could then be completed within a single day, while human-error risks were dramatically reduced and disclosure data accuracy was enhanced.

KPMG | Automated Ad Compliance Analyzer - Before sending advertisements to advertisers, companies must undergo preliminary review based on internal advertising regulations. This preliminary advertising review is not about expertise, but rather a task of meticulously examining content according to internal guidelines. In other words, it's a simple repetitive task. We developed a service that automates this process using Large Language Models (LLMs). The system processes advertising data through image conversion, text extraction, and analysis via LLM to evaluate compliance with regulations. Through this service, company advertising managers can now know the compliance rate of advertisements without needing to conduct preliminary reviews themselves.

KPMG | Trade AI: Revolutionizing Access to U.S. trade administration - We developed a solution that tackles the overwhelming daily document uploads on the U.S. trade administration (https://access.trade.gov/), eliminating the need for customs experts to manually search through vast amounts of data. By implementing a RAG (Retrieval-Augmented Generation) system, we've created an automated workflow that crawls, collects, and vectorizes documents in a Milvus database, allowing for efficient cosine similarity searches. Customers can now instantly discover relevant cases and make inquiries about them through our intuitive chatbot interface. Our query-rewriting technology extracts filtering elements such as year and case codes from customer queries, enabling targeted vector searches and significantly enhancing performance for customs experts.

KPMG | Empowering Security Policy Analysis with AI - Reviewing and improving corporate security policies and guidelines is a challenging process that often leads to inconsistent quality based on the reviewer's expertise. Our innovative tool leverages Large Language Models to analyze security policies with consistent performance and accuracy. By utilizing KPMG's extensive database of over 300 security-related resources and AI-powered web searches, we comprehensively review policies to identify missing elements and ensure alignment with the latest security trends. The system delivers rapid, precise assessments while maintaining uniform quality standards regardless of the complexity of the security framework. Additionally, we provide an intuitive chatbot service that allows users to ask questions based on the analysis results, making security policy management more efficient and effective.

[Video] Security Policy Analysis Tool: Powered by AI

Okestro | Create an LLM model for RAG system - Large language model(LLM) for Korean specialized in high-performance retrieval-augmented generation(RAG) system is very rare. We created Korean LLM specialized for RAG using Orion-14B as a foundation model. A total of 800,000 data were selected from the AI hub. This data was refined and GPT-4 was used to create 8,000 high-quality RAG training data. We used data distributed parallel(DDP) to train multiple A100 GPUs in parallel and utilized the low-rank adaptation of large language models(LoRA).

Okestro | Korean Embedding Model Pipeline in Closed Network RAG System - In order to provide chatbot services to customers in a closed network environment, it is essential to provide an embedding model. We are working on a project to secure a high-performance Korean language embedding model. As a project leader, I researched several baseline Korean embedding models on the huggingface. We selected a baseline model and automated the construction of a dataset for embedding model fine-tuning. By fine-tuning the embedding model using these datasets, we were able to have a Korean embedding model with significantly better performance than the text-ada-embedding-002 provided by OpenAI.

Okestro | Deploying chatbot service specific to cloud domain - We are developing Chatbot specific to cloud domains using Retrieval-Augmented Generation (RAG) system. Through this, we can primarily handle product-related inquiries from in-house customers. we are working on how to generate answers that can give trust to users in the RAG system.

[PPT] RAG system

Okestro | FAQ Classification - This paper contributes to enhancing the experience of users and administrators in the FAQ system. We present a model that uses BERT and contrastive learning to automatically classify users' inquiries. The model eliminates the need for users to have domain knowledge or for administrators to classify multiple inquiries. This can significantly improve the efficiency of the inquiry management system, reduce the workload of users and administrators and enhance the overall user experience.

Yongshin Kim, Taehee Lee, Sanghyeon Jung, Chanjae Lee, Taewan Kwon, “Improving Cloud FAQ Experience through Contrastive Learning-based Inquiry Classification”, Korea Computer Congress, Jeju Island, Korea (2023)
[PPT] KCC 2023

Okestro | Log-Level Anomaly Detection - We present FineLog, a log message-wise anomaly detection framework that enables anomaly detection for each log message in the context of a given sequence. This study shows that FineLog not only records high performance in the existing sequence unit anomaly detection, but also records high performance in log unit anomaly detection.

Taewan Kwon, Chanjae Lee, Sanghyeon Jung, Yongshin Kim, Taehee Lee, “Framework for Log-Level Anomaly Detection in a Log Sequence using Bidirectional Encoder Representations from Transformers”, Korea Computer Congress, Jeju Island, Korea (2023)

Okestro | Baseline of SWOT Classification - We present baseline indicators by introducing BERT model, which is widely used in the natural language processing field, for the first time in SWOT analysis. Starting with this approach, the baseline indicators of this study are expected to be useful for business intelligence cloud platforms that can be easily accessed by all stakeholders through deep learning of SWOT analysis.

Yongshin Kim, Sanghyeon Jung, Chanjae Lee, Jinhee Kim, “Baseline of SWOT Classification using Bidirectional Encoder Representations from Transformers for Business Intelligence Cloud Platform”, Korean Institute of Information Technology, 2022 Fall Conference, Jeju Island, Korea (2022)
[PPT] SWOT

Okestro | Automation of Company SWOT Analysis Using Sentence BERT - This study presented SWOT Sentence BERT as an AIaaS model that can intellectually automate company SWOT analysis. The SWOT Sentence BERT is a sentence embedding model that is learned through SWOT text data processed in the form of natural language inference task. In order to automate SWOT analysis, we applied K-Means clustering algorithm to make clusters with sentence embeddings and classified sentence embeddings based on their predicted clusters.

Sanghyeon Jung, Yongshin Kim, Kwangpil Jeong, Taehee Lee, Taewan Kwon, “Automation of Company SWOT Analysis Using Sentence BERT”, Korean Institute of Information Technology, 2022 Fall Conference, Jeju Island, Korea (2022)

KAIST | Emotion Recognition - In this study, we propose a method for predicting the emotions of the speaker in the naturalistic conversation using a speaker encoder and counterpart encoder composed of CNN-LSTM deep learning networks. We used emotion-related data called K-EmoCon collected during the debate process to empirically evaluate our model. The results showed that the counterpart’s speech and the physiological signals had a positive impact on predicting the speaker’s emotions.

Yongshin Kim, “Improving Multi-modal Emotion Recognition with Counterpart Data in Dyadic Conversations”, Master thesis, 2022 Spring
[PPT] Master Defense

Deepbrain AI | Video & Speech synthesis - We produced AI Human through voice conversion using StarGAN-VC and image synthesis using FSGAN. Through the process of analyzing various voice conversion models(CycleGAN, StarGAN-VC, StarGAN-VC2), we improved the quality of voice conversion by analyzing loss terms, number of domain classes, batch size, and iteration. In addition, the average similarity between the source video and the target video was used to facilitate video synthesis.

KAIST | Digital Therapeutics(DTx) - We develop fundamental technologies of data-driven digital therapeutic, receptivity optimization for mobile digital therapeutic development. Furthermore, we analyze the effectiveness of digital treatments by applying causal analysis.

Yongshin Kim, Panyu Zhang, Gyuwon Jung, Heepyung Kim, Uichin Lee, “Causal Analysis of Observational Mobile Sensor Data: A Comparative Study”, Korea Computer Congress, 2021 Spring Conference, Jeju Island, Korea (2021)
[PPT] KCC 2021

KAIST | Smart Mask - We review existing types of smart masks and study what sensor data can be collected through smart masks. Also, we develop emotional and stress monitoring algorithms through smart mask sensor data.

Peter Lee, Heepyung Kim, Yongshin Kim, Woohyeok Choi, M. Sami Zitouni, Ahsan Khandoker, Herbert F. Jelinek, Leontios Hadjileontiadis, Uichin Lee, Yong Jeong, “Beyond Pathogen Filtration: Possibility of Smart Mask as Wearable Device for Personal and Group Health and Safety Management”, Journal of Medical Internet Research, 2022

KAIST | Contact Tracing - We develop risk scoring algorithms that can be used for analysis to detect BLE contact between client devices. In addition, the need to introduce a place beacon is increasing as the number of cases infected with COVID increases just by staying in the same place without having to contact the confirmed patient directly. We also identify the coverage that place beacon can effectively send and receive signals with client devices.

[PPT] Controlled Experiments

KAIST | [KSE801] Recommender System and Graph Machine Learning - We developed SRM(Stress Recommendation system with Mobile sensor data), a recommendation system to relieve human stress by utilizing mobile sensor data. Unlike previous recommendation systems using correlation and cosine similarity, SRM can more accurately identify and recommend factors that cause stress because it operates based on causal analysis using a counterfactual approach. (Course Project)

KAIST | [KSE531] Human-Computer Interaction: Theory and Design - We introduce a new interpretation of art experience by presenting CAN(Communicative Art Network), which allows viewers to achieve self-reliant thinking based on their art experience. CAN is a platform that consists of CAN AR Comment System and CAN SNS. (Course Project)

KAIST | [KSE526] Analytical Methodologies for Big Data - We propose EFNet(Energy usage Forecasting Network) which forecasts energy usage by using attention based CNN-LSTM networks. EFNet is a sequential prediction model that reflects the characteristics of variables such as weather, calendar, oil price, and COVID-19 confirmed cases. (Course Project)

KAIST | [KSE801] Sensor Data Science - Wearable devices and smartphones can be used to track a person's body and physiological conditions to determine the relationship between physical activity and stress level. The goal of the research was to predict people’s daily stress level from three different sets of data: 1) physical data 2) User info data 3) ESM data. (Course Project)

[PPT] Emotion recognition with mobile phone & wearable sensor data

Experiences

Machine Learning Engineer

2024.08 - Present

KPMG

I am currently employed at KPMG as a machine learning engineer, focusing on the development of an AI platform that offers a range of AI technologies such as Natural Language Understanding (NLU) and Natural Language Generation (NLG) to our clients.

Machine Learning Engineer

2022.09 - 2024.08

Okestro

I was working as a machine learning engineer studying NLP. Our team’s attention was directed towards generative models, particularly the utilization of Retrieval-Augmented Generation to build a chatbot capable of responding to domain-specific queries. We were leveraging the langchain library to simplify this process. Furthermore, we were engaged in a project aimed at fine-tuning an open-source GPT model like llama to acquire expertise in our specific domain.

Researcher

2020.08 - 2022.07

Interactive Computing Lab, KAIST

I participated in Digital Therapeutics(DTx), Smart mask, Contact tracing projects. I focused on the DTx project to develop a platform that analyzes the effectiveness of digital treatments by studying and applying causal analysis techniques such as matching and CCM and also developing algorithms and interactive visualization platforms that use human biometric data to predict stress levels.

Director

2019.12 - 2020.07

Dep. of General Affairs, Student Government, HGU

I worked as a general affairs director in the student government. I managed a budget of about 100 million won for student expenses and established and executed a large and small project funding plan. I also improved the maintenance and update of organizational documents, implementation of all necessary policies, and policies and procedures related to human resources.

Data Analyst

2018.08 - 2020.07

Technological entrepreneurship Lab, HGU

As an undergraduate researcher, I worked at the Technological entrepreneurship Lab supervised by Doohee Chung for data analysis. As a data analyst, I participated in various projects related to Technological Entrepreneurship. In particular, I used a number of statistical techniques, including hierarchical regression, to analyze moderating effects, mediating effects, etc. through SPSS, STATA, and AMOS.

Operation Manager

2018.03 - 2018.09

Handong English Camp, HGU

I worked as a general manager at an English camp for about 300 elementary and middle school students. I operated all the programs and other things related to the two-month camp schedule. From the camp preparation stage to the start and end of the camp, the whole process was in English.

English Instructor

2017.03 - 2017.11

Global Vision Christian School (GVCS)

I hosted an English class for about 100 elementary and middle school students at GVCS located in Sejong City, South Korea. Participated as an equivalent English instructor with 30 other native speakers. Also, I worked as an English interpreter for various events as well as conducted classes.

Accountant

2017.03 - 2019.08

International freshmen orientation, HGU

I participated in the orientation for 200 foreign freshmen as an accountant for a total of 6 semesters. I ran a budget of 30 million won each time and set up and implemented a program funding plan. Since most of the staff are also foreigners, we communicated in English from the preparation process to the end of the orientation.

Publications

Beyond Pathogen Filtration: Possibility of Smart Mask as Wearable Device for Personal and Group Health and Safety Management

Peter Lee, Heepyung Kim, Yongshin Kim, Woohyeok Choi, M. Sami Zitouni, Ahsan Khandoker, Herbert F. Jelinek, Leontios Hadjileontiadis, Uichin Lee, Yong Jeong

Journal of Medical Internet Research, 2022

Nonlinear Relationship Between Technological Entrepreneurship and National Competitiveness: The Moderation Effect of Innovation-driven Economy

Seung-Lin Yang, Yongshin Kim, Doohee Chung

Journal of Technology Innovation, 2019 [PPT]

The Effect of Intellectual Property-Based Startups on Employment

Haejun Jung, Yongshin Kim, Doohee Chung

Korea Technology Innovation Society, 2019 [PPT]

Patents

Kim. Y, Lee. T, Jung. S, "AN INQUIRY MANAGEMENT SYSTEM USING CLASSIFICATION METHOD BASED IN CLOUD SERVICE AND A PLATFORM FOR INQUIRY-RESPONSE INTEGRATED MANAGEMENT"

KR - Application No.10-2022-0190619

Jeong. K, Kim. Y, Ahn. S, Lee. T, Kim. Y, Kim. M, "A CLOUD SERVER OPERATING SYSTEM IMPLEMENTING INDIVIDUAL VIRTUALIZATION OF RESOURCES AND A METHOD FOR OPERATING CLOUD SERVER"

KR - Application No.10-2022-0190619

Jung. S, Kwon. T, Lee. T, Kim. Y, Lee. C, Kim. J, "A CREATION MODULE FOR AUTOMATIC SWOT ANALYSIS TOOL USING ARTIFICIAL INTELLIGENCE AND A SWOT ANALYSIS SYSTEM COMPRISING THE SAME"

KR - Application No.10-2022-0179834

Personal Studies

Deep Learning - While studying deep learning as an NLP engineer, I made PPT with things I have been curious about or want to discuss about.

Fun - Areas that I study and find fascinating when I have free time.

Wiki - As an engineer, I gathered useful information for studying and collaborating with people.

Yongshin Kim

Machine Learning Engineer

Education

Data Science (M.S.)

KAIST | Korea Advanced Institute of Science and Technology

Mathematics and Statistics (B.A.)

HGU | Handong Global University

Languages

Interests

Welcome to Yongshin's page

Projects

Experiences

Machine Learning Engineer

Machine Learning Engineer

Researcher

Director

Data Analyst

Operation Manager

English Instructor

Accountant

Publications

Patents

Personal Studies

Skills & Proficiency

Python

SQL

R

SPSS & AMOS

STATA