LATEST RESEARCH FROM CLPS INNOVATION LAB – INTELLIGENT RECOMMENDATION SYSTEM FOR TALENTS (IRST) POWERED BY BIG DATA TECHNOLOGY
CLPS Innovation Lab, CLPS’s research center, focuses on the research and application of cutting-edge technologies in fintech industry, including blockchain, cloud computing, big data, and artificial intelligence, among others. It aims to promote the new upgrade of CLPS’s technology driven solutions and product innovation.
Intelligent Recommendation System for Talents (IRST) research aims to incorporate the actual application of big data technology in the financial industry, such as precision marketing based on user’s profile, to the talent matching for IT project delivery. CLPS will utilize its existing operation as the application field to explore and analyze big data’s advantage to improve operational efficiency and to develop more application scenarios.
Why do we need IRST?
According to a group of international IT services provider, information disparity between the talents and the project requirements restricts the delivery efficiency of IT services in a large number of projects.
Although there are hundreds of thousands or even millions of talent resources covering all kinds of skill set and regions, the promptness of providing the right talent to a project is a factor that affects the efficiency of IT services due to the specific skills that every industry needs.
Previously, the first step of matching the right talent to the project is to analyze the demand. Preliminary selection of candidates can be identified through single or multiple search by category, until the most suitable candidate is determined.
In the process, the subjective skill of the recruiter would be the most important prerequisite for a successful matching. If the project demand has not been accurately analyzed, the overall direction of the search will fail. In addition, the traditional functions of searching talent are time consuming and could not guarantee accurate results.
Powered by big data technology, upgrading the résumé matching function of the legacy talent system can accurately match the right talent to the project. Thus, the effectiveness of the intelligent recommendation system can be achieved.
New recommendation plan with big data approach
The formulation of the recommendation scheme and the selection of the core algorithm are the most difficult parts of building a set of mature recommendation system. Considering that the IRST research has its specific requirement, which is to match the IT professional to the IT project, the research group starts from the general perspective of information technology field to gradually achieve intelligent recommendation. The development plans are as follows:
- Build IT skills keyword archive. The archive serves as the basis for optimizing and improving the accuracy of searching.
- The keyword archive serves as an extended keyword list of the talent big data search engine to improve the matching degree of the search.
- The full text of the project requirements is introduced as the original search condition, and the skill information is analyzed intelligently to match the condition of the talent in the big data servers.
- Further analysis and extraction of other project requirements such as the description of educational requirements and years of work experience are supplementary matching conditions to effectively improve the final matching hit ratio.
- Mine more talent fields such as the expected working place to further improve the result of matching.
- Customize the flow of forward maximum matching algorithm for English, Chinese, Chinese-English mixed data analysis framework.
- Build dynamic weight rules for deployment without restarting the system.
- When the project demand reaches a certain volume, “recommended talents to the project” will be converted into “recommended projects to the talent”. In this way, the matching result will be significantly faster.
System architecture of IRST based on the big data platform
IRST utilizes ElasticSearch (ES) as the foundation platform because it searches fast, highly reliable, and its flexibility and extensibility are powerful. The system architecture design fully considered factors such as the data synchronization, data fault tolerance, and data analysis extension. The system architecture diagram is as follows:
1)External Display Layer
The external display remains the same as in the legacy talent resource database system, such as the ERP of the company. The architecture and logic of the legacy system and the users’ operation habits are kept unchanged.
IRST provides APIs of the legacy talent resource database system to match and synchronize the data. The APIs are built with the currently popular SpringBoot framework, which is suitable for building RESTFUL style web services quickly.
3) Core Functional Layer
“Core Services” complete the actual processing of the requests from the API, understand the search of talent data from ES, and synchronize the talent’s data from the talent database system to ES and HBase.
“Job for Batch Data” calculates the indicators in advance and standardizes the talent’s data via Scala which requires lesser code but guarantees development efficiency. It is implemented by Spark components, which has streamlined memory computing.
4) Foundation Platform Layer
Lucene-based ES is used as the fundamental search engine for talent data matching.
Zookeeper is used to combine and coordinate the services of the infrastructure and to manage the dynamic and switch parameters simultaneously. As a result, it achieves the latest system parameter updates and avoids system restart.
Redis is used for cache processing to avoid double calculation, reduce system load, and improve system response.
The excellent performance of Kafka decouples the data chains, simplifies the complex nature of the internal module units, prevents the crash of data, and improves the overall reliability of the system.
On top of that, HBase is added on the HDFS. As a backup database, Spark is used to perform offline analysis based on HBASE. Since ES is not associated with HBase, offline analysis does not interfere with the main search function.
Based on the performance test results of single service node that supports up to roughly 440 simultaneous matching requests, as well as the evaluation of the actual business demand for simultaneous volume, three service nodes were configured in the production environment. Nginx is used for reverse proxy and load balancing to meet the requirements of high availability (HA) and continuous delivery.
Simple and easy operation of IRST
With upgraded Intelligent Recommendation System for Talents (IRST), the matching results can be automatically generated with a single click, and the priority of the recommendation is notable to facilitate decision making.
- IRST realizes the automation and intelligence of the project analysis process, saves time of manual analysis, and avoids the uncontrollable influence caused by subjective factors such as experience and cognition.
- IRST has achieved comprehensive database check, significantly narrowed down the range of screening, and reduced the cost of manpower and time.
- Talent matching capability has been greatly improved. The project analysis task that previously required a team can now be completed by one person in few seconds.
- The talent matching accuracy has been significantly improved. The thousands of results are now reduced to hundreds, and the matching success rate is about 80%.
- The required manpower has been significantly reduced. The operation which was highly dependent on the industry experience can now handle by junior staff with less experience and with basic training.
- IRST is highly compatible with the legacy core system with seamless connection between the data layer and the business layer on the two systems. It significantly simplifies the legacy operating process without affecting the users’ operation habits.
IRST research also provides additional ideas on the following topics:
Intelligent recommendation system for other business areas. Although IRST focuses on matching the talent to the IT projects, accurate matching and intelligent recommendation are also applicable to other business areas.
Skill Graph. Based on the concept of the IT skills keyword archive adopted by IRST, building skills-knowledge map by means of artificial building + machine learning is also in the pipeline. Its purpose is to display the relationship among various skills in business and technical aspect and to better understand the required specific skills.
Integrated management platform. Adopted in the development of IRST, the platform can solve automation operations in the development and testing process, including one-click deployment, process manipulation, and process/cluster resource monitoring, which efficiently completes the release and monitoring of the development, testing, and production of the system’s environment.