JP / EN

PEOPLEThe People of NICT #5
I want to lead the development of
cutting-edge technologies using the VoiceTra app,
on equal footing with global giants in the research of speech synthesis.

HOME/ OKAMOTO Takuma|PEOPLE

OKAMOTO Takuma岡本 拓磨Senior Researcher
Advanced Speech Technology Laboratory
Advanced Speech Translation Research and Development Promotion Center
Universal Communication Research Institute

(At the time of theinterview)
Senior Researcher
Advanced Speech Technology Laboratory
Advanced Speech Translation Research and Development Promotion Center

He completed the doctoral program at the Graduate School, Tohoku University. In 2012, he joined NICT after working as a COE Fellow at the School of Engineering, Tohoku University. He worked in Multi-sensory Evaluation Laboratory and Voice Communication Laboratory. In 2016, he joined the Advanced Speech Technology Laboratory. He has been in his current position since 2020.

Having been absorbed in studying sound and acoustics, I joined NICT. The biggest appeal of this environment is that we can enjoy freedom in conducting our research.

When I was a child, I loved machines so much that I would disassemble and assemble a bicycle. At university, I wanted to study robotics, but I could not find a position in the relevant laboratory. Meanwhile, I was as interested in audio as I was in machines. Thus, I looked for and joined a laboratory where I could study sound and acoustics. When I first joined the laboratory, I was thinking of trying it for a year. When I realized, however, I found myself being interested in research to create good sounds by using mathematics and physics. Since then, I have continued to engage in the research to this day.
Before joining NICT, I was conducting research at university on recording and control of a 3D sound field relating to ultra-realistic communication. One day, I learned that NICT was looking for a researcher on adding 3D sound to stereoscopic images, which was my specialty. Then, I decided to join there. At NICT, I conducted research on ultra-realistic communication for two years. Then, in 2014, I started to research on speech communication. At that time, my superiors told me that I should also continue the research that I had worked on. Encouraged by the words of support, I came to be able to produce results in a cutting-edge research on speech technologies as well. At the academic conferences I attend, I always make presentations in both the fields of speech communication and ultra-realistic communication. Because my superior is an understanding person, I have been able to broaden the scope of my research, based on my own proposal. I have not faced with any communication problems here. At NICT, you will be able to freely conduct your own research if you achieve results in the research allocated to you.

I aim to conduct the world's most advanced research and development whose results can be returned to society while considering the balance between technology and usability.

Currently, I am engaged in the research and development of speech synthesis in the multilingual speech translation technology. Speech synthesis is a technology that reads aloud the input text naturally. Used in various contents such as VoiceTra,*1 it is an important technology for speech dialogue systems. What I consider most as my role here is to conduct research on the most advanced technologies in the world. In the industry of multilingual speech translation, there are many powerhouses around the world. They promote cutting-edge research and proactively make presentations at a rapid pace. When I made a presentation at a top conference, I was thrilled to meet with a leading researcher in the field of speech synthesis with neural networks, who came to ask me questions casually. I felt rewarded for my hard work to complete the paper. The industry has become exciting and inspiring. Conversely, I do not think the current situation is sufficient from the perspective of returning the research results to society. For example, if a state-of-the-art neural network model can produce a good sound in speech synthesis, it can be written in a paper. However, even if the sound quality is good, it cannot be used in VoiceTra or other contents if the synthesis speed is slow. Accordingly, we choose to incorporate into the system a method that can synthesize sounds at high speed, sacrificing the sound quality to some extent. In such a process, new research projects may arise. Thus, it is important to think about a good balance between technology and usability in order to implement technology in the real world. On the basis of this concept, I am trying every day to conduct the world's most advanced research and achieve significant results.


*1: A multilingual speech translation app that has been open to the public as a demonstration experiment since 2010

What is important is "freedom and balanced capabilities." This can be achieved because there is an environment where you can devote yourself to research.

We are of course required to fulfill our responsibility to maintain and develop mission-oriented research. Meanwhile, I believe that we must also be allowed to conduct research on the basis of our free ideas. This, I believe, will lead to the creation of a new value.
Conducting research requires many important skills, such as information gathering ability, creativity, development ability, and paper writing ability. We also need programming skills at a level higher than that required for our current research level. However, no matter how good you are at writing a paper, it is insufficient to continue research on technologies that will not be actually used in society. Moreover, it is not enough that you can develop a system, but you have to conduct research with high academic quality. At NICT, we believe that it is important to have balanced capabilities, as shown in above. I am grateful that our superiors give much consideration at NICT so that researchers can produce good results. I think that we can devote ourselves to research because we have such an environment here. NICT provides us with abundant resources, research facilities, and the generous support of administrative and other support staff. For example, if I murmur to myself, "Oh, this research plan probably won't pass the screening," an assistant staff member would talk to me, "How about changing this part to like that to make it pass?" Even such little help means a lot to me.
NICT also accepts internship students, providing them with a state-of-the-art research environment. This, I believe, is one of the institute's major characteristics.

My research results

I would like to start a new research project, using my experiences in two fields: speech communication and ultra-realistic communication.

In recent years, we have reached the level, in speech synthesis, where we can realize a real-time synthesis with almost the same sound quality as natural speech, using the latest neural network technology. In this fiscal year*2,I also started the research and development of cross-lingual speech synthesis that converts a speech by a Japanese speaker into English. Meanwhile, I have been promoting a research and development of the technology that realizes multiple sound zones, using the KAKENHI*3fund that I personally was granted. This technology aims to realize ultra-realistic communication by generating areas where sound can be heard and areas where sound cannot be heard, using a large number of speakers. My strength is that I conduct research in both fields of speech and acoustic signal processing. Therefore, I hope to start, in the future, a new research that integrates technologies in both fields. NICT incorporates many research institutes and centers in different areas within the information and communication field. I think it would be interesting if we could integrate our technologies through collaboration between departments. Furthermore, I aim to promote research and development that let many people know how amazing NICT is. To this end, I will endeavor to produce substantial results in both fields of my research: the world's leading speech synthesis technology, and the research promoted using my KAKENHI fund.


*2: FY2020

*3: Grants-in-Aid for Scientific Research Program (for Single-year Grants and Multi-year Fund) conducted by the Japan Society for the Promotion of Science 

My free time

I love drinking with my friends and with my colleagues. Meanwhile, I started jogging when I was in college, and I still run 10 km every day. While running, I can concentrate on thinking about a research, wiping other things out of mind. When I first started running, I was motivated to set a good record. Now, jogging has come an indispensable part of my life because it gives me time to think freely. On my days off, my family sometimes accompanies me on an electric bicycle. However, I have a hard time soothing my son when he has become very fussy while I run.

OTHERS

CONTACT

CONTACT

If you have any questions regarding recruitment, please contact us below.

Recruitment Officer

Mailjinji-r@ml.nict.go.jp
Address Personnel Affairs Group, Personnel Affairs Office, General Affairs Department/
Strategic Planning Office, Strategic Planning Department,
National Institute of Information and Communications Technology (NICT)4-2-1 Nukui-kitamachi, Koganei, Tokyo 184-8795, Japan
ENTRY [Positions List]