Welcome to Yongshin's page

I graduated with an M.S. from the Graduate School of Data Science at KAIST, where I was a member of the Interactive Computing Lab under the guidance of Prof. Uichin Lee. My master’s research focused on predicting human emotion using audio and physiological signals, specifically analyzing how the counterpart’s data positively influences the speaker’s emotion prediction during naturalistic conversations.

Previously, I worked as a machine learning engineer at Okestro, where I specialized in Natural Language Processing (NLP). Our team was dedicated to automating corporate analysis in the field of business intelligence using NLP. We developed a service for real-time company evaluation in a cloud environment. We also explored generative models, including the use of Retrieval-Augmented Generation to create a domain-specific chatbot. In addition, we worked on a project to finetuned an open-source gpt model such as Llama, Orion to improve the RAG inference ability for GPT.

I am currently employed at KPMG as a senior consultant, focusing on the development of an AI platform that offers a range of AI technologies such as Natural Language Understanding (NLU) and Natural Language Generation (NLG) to our clients.

Outside of my professional life, I enjoy staying active through sports, particularly racket games like badminton and tennis. I also find relaxation in playing Go, a game that, despite its simple rules, requires deep strategic thinking. I am known for my consistency and commitment, always striving to approach tasks with responsibility and dedication.

Projects

KPMG | Building an AI Platform - We are currently engaged in the development of a comprehensive AI platform that integrates cutting-edge technologies such as chatbot, data visualization, and specialized data analysis for the manufacturing sector. Our chatbot system is designed to handle queries across a wide range of domains, providing accurate and context-aware responses. Through this platform, we aim to deliver a robust solution that addresses diverse industry needs while driving efficiency and innovation.
Okestro | Create an LLM model for RAG system - Large language model(LLM) for Korean specialized in high-performance retrieval-augmented generation(RAG) system is very rare. We created Korean LLM specialized for RAG using Orion-14B as a foundation model. A total of 800,000 data were selected from the AI hub. This data was refined and GPT-4 was used to create 8,000 high-quality RAG training data. We used data distributed parallel(DDP) to train multiple A100 GPUs in parallel and utilized the low-rank adaptation of large language models(LoRA).
Okestro | Korean Embedding Model Pipeline in Closed Network RAG System - In order to provide chatbot services to customers in a closed network environment, it is essential to provide an embedding model. We are working on a project to secure a high-performance Korean language embedding model. As a project leader, I researched several baseline Korean embedding models on the huggingface. We selected a baseline model and automated the construction of a dataset for embedding model fine-tuning. By fine-tuning the embedding model using these datasets, we were able to have a Korean embedding model with significantly better performance than the text-ada-embedding-002 provided by OpenAI.
Okestro | Deploying chatbot service specific to cloud domain - We are developing Chatbot specific to cloud domains using Retrieval-Augmented Generation (RAG) system. Through this, we can primarily handle product-related inquiries from in-house customers. we are working on how to generate answers that can give trust to users in the RAG system.
Okestro | FAQ Classification - This paper contributes to enhancing the experience of users and administrators in the FAQ system. We present a model that uses BERT and contrastive learning to automatically classify users' inquiries. The model eliminates the need for users to have domain knowledge or for administrators to classify multiple inquiries. This can significantly improve the efficiency of the inquiry management system, reduce the workload of users and administrators and enhance the overall user experience.
Okestro | Log-Level Anomaly Detection - We present FineLog, a log message-wise anomaly detection framework that enables anomaly detection for each log message in the context of a given sequence. This study shows that FineLog not only records high performance in the existing sequence unit anomaly detection, but also records high performance in log unit anomaly detection.
Okestro | Baseline of SWOT Classification - We present baseline indicators by introducing BERT model, which is widely used in the natural language processing field, for the first time in SWOT analysis. Starting with this approach, the baseline indicators of this study are expected to be useful for business intelligence cloud platforms that can be easily accessed by all stakeholders through deep learning of SWOT analysis.
Okestro | Automation of Company SWOT Analysis Using Sentence BERT - This study presented SWOT Sentence BERT as an AIaaS model that can intellectually automate company SWOT analysis. The SWOT Sentence BERT is a sentence embedding model that is learned through SWOT text data processed in the form of natural language inference task. In order to automate SWOT analysis, we applied K-Means clustering algorithm to make clusters with sentence embeddings and classified sentence embeddings based on their predicted clusters.
KAIST | Emotion Recognition - In this study, we propose a method for predicting the emotions of the speaker in the naturalistic conversation using a speaker encoder and counterpart encoder composed of CNN-LSTM deep learning networks. We used emotion-related data called K-EmoCon collected during the debate process to empirically evaluate our model. The results showed that the counterpart’s speech and the physiological signals had a positive impact on predicting the speaker’s emotions.
Deepbrain AI | Video & Speech synthesis - We produced AI Human through voice conversion using StarGAN-VC and image synthesis using FSGAN. Through the process of analyzing various voice conversion models(CycleGAN, StarGAN-VC, StarGAN-VC2), we improved the quality of voice conversion by analyzing loss terms, number of domain classes, batch size, and iteration. In addition, the average similarity between the source video and the target video was used to facilitate video synthesis.
KAIST | Digital Therapeutics(DTx) - We develop fundamental technologies of data-driven digital therapeutic, receptivity optimization for mobile digital therapeutic development. Furthermore, we analyze the effectiveness of digital treatments by applying causal analysis.
KAIST | Smart Mask - We review existing types of smart masks and study what sensor data can be collected through smart masks. Also, we develop emotional and stress monitoring algorithms through smart mask sensor data.
KAIST | Contact Tracing - We develop risk scoring algorithms that can be used for analysis to detect BLE contact between client devices. In addition, the need to introduce a place beacon is increasing as the number of cases infected with COVID increases just by staying in the same place without having to contact the confirmed patient directly. We also identify the coverage that place beacon can effectively send and receive signals with client devices.
KAIST | [KSE801] Recommender System and Graph Machine Learning - We developed SRM(Stress Recommendation system with Mobile sensor data), a recommendation system to relieve human stress by utilizing mobile sensor data. Unlike previous recommendation systems using correlation and cosine similarity, SRM can more accurately identify and recommend factors that cause stress because it operates based on causal analysis using a counterfactual approach. (Course Project)
KAIST | [KSE531] Human-Computer Interaction: Theory and Design - We introduce a new interpretation of art experience by presenting CAN(Communicative Art Network), which allows viewers to achieve self-reliant thinking based on their art experience. CAN is a platform that consists of CAN AR Comment System and CAN SNS. (Course Project)
KAIST | [KSE526] Analytical Methodologies for Big Data - We propose EFNet(Energy usage Forecasting Network) which forecasts energy usage by using attention based CNN-LSTM networks. EFNet is a sequential prediction model that reflects the characteristics of variables such as weather, calendar, oil price, and COVID-19 confirmed cases. (Course Project)
KAIST | [KSE801] Sensor Data Science - Wearable devices and smartphones can be used to track a person's body and physiological conditions to determine the relationship between physical activity and stress level. The goal of the research was to predict people’s daily stress level from three different sets of data: 1) physical data 2) User info data 3) ESM data. (Course Project)

Experiences

Senior AI Consultant

2024.08 - Present
KPMG

I am currently employed at KPMG as a senior consultant, focusing on the development of an AI platform that offers a range of AI technologies such as Natural Language Understanding (NLU) and Natural Language Generation (NLG) to our clients.

Machine Learning Engineer

2022.09 - 2024.08
Okestro

I am currently working as a machine learning engineer studying NLP. Our team’s attention is directed towards generative models, particularly, the utilization of Retrieval-Augmented Generation to build a chatbot capable of responding to domain-specific queries. We are leveraging the langchain library to simplify this process. Furthermore, we are engaged in a project aimed at fine-tuning an open-source GPT model like llama to acquire expertise in our specific domain.

Researcher

2020.08 - 2022.07
Interactive Computing Lab, KAIST

I participated in Digital Therapeutics(DTx), Smart mask, Contact tracing projects. Currently, I’m focusing on the DTx project to develop a platform that analyzes the effectiveness of digital treatments by studying and applying causal analysis techniques such as matching and CCM and also developing algorithms and interactive visualization platforms that use human biometric data to predict stress levels.

Director

2019.12 - 2020.07
Dep. of General Affairs, Student Government, HGU

I worked as a general affairs director in the student government. I managed a budget of about 100 million won for student expenses and established and executed a large and small project funding plan. I also improved the maintenance and update of organizational documents, implementation of all necessary policies, and policies and procedures related to human resources.

Data Analyst

2018.08 - 2020.07
Technological entrepreneurship Lab, HGU

As an undergraduate researcher, I worked at the Technological entrepreneurship Lab supervised by Doohee Chung for data analysis. As a data analyst, I participated in various projects related to Technological Entrepreneurship. In particular, I used a number of statistical techniques, including hierarchical regression, to analyze moderating effects, mediating effects, etc. through SPSS, STATA, and AMOS.

Operation Manager

2018.03 - 2018.09
Handong English Camp, HGU

I worked as a general manager at an English camp for about 300 elementary and middle school students. I operated all the programs and other things related to the two-month camp schedule. From the camp preparation stage to the start and end of the camp, the whole process was in English.

English Instructor

2017.03 - 2017.11
Global Vision Christian School (GVCS)

I hosted an English class for about 100 elementary and middle school students at GVCS located in Sejong City, South Korea. Participated as an equivalent English instructor with 30 other native speakers. Also, I worked as an English interpreter for various events as well as conducted classes.

Accountant

2017.03 - 2019.08
International freshmen orientation, HGU

I participated in the orientation for 200 foreign freshmen as an accountant for a total of 6 semesters. I ran a budget of 30 million won each time and set up and implemented a program funding plan. Since most of the staff are also foreigners, we communicated in English from the preparation process to the end of the orientation.

Publications

  • Beyond Pathogen Filtration: Possibility of Smart Mask as Wearable Device for Personal and Group Health and Safety Management
  • Peter Lee, Heepyung Kim, Yongshin Kim, Woohyeok Choi, M. Sami Zitouni, Ahsan Khandoker, Herbert F. Jelinek, Leontios Hadjileontiadis, Uichin Lee, Yong Jeong

    Journal of Medical Internet Research, 2022

  • The Effect of Intellectual Property-Based Startups on Employment
  • Haejun Jung, Yongshin Kim, Doohee Chung

    Korea Technology Innovation Society, 2019 [PPT]

    Patents

    Kim. Y, Lee. T, Jung. S, "AN INQUIRY MANAGEMENT SYSTEM USING CLASSIFICATION METHOD BASED IN CLOUD SERVICE AND A PLATFORM FOR INQUIRY-RESPONSE INTEGRATED MANAGEMENT"
    Jeong. K, Kim. Y, Ahn. S, Lee. T, Kim. Y, Kim. M, "A CLOUD SERVER OPERATING SYSTEM IMPLEMENTING INDIVIDUAL VIRTUALIZATION OF RESOURCES AND A METHOD FOR OPERATING CLOUD SERVER"
    Jung. S, Kwon. T, Lee. T, Kim. Y, Lee. C, Kim. J, "A CREATION MODULE FOR AUTOMATIC SWOT ANALYSIS TOOL USING ARTIFICIAL INTELLIGENCE AND A SWOT ANALYSIS SYSTEM COMPRISING THE SAME"

    Personal Studies

    Fun - Areas that I study and find fascinating when I have free time.
    Wiki - As an engineer, I gathered useful information for studying and collaborating with people.

    Skills & Proficiency

    Python

    SQL

    R

    SPSS & AMOS

    STATA