Categories Machine Learning

Young Researchers’ Symposium on Natural Language Processing (YANS) 2025

My first conference experience.

*The content of this blog conveys my personal experiences and opinions only.

Making of a “Team”

From “I should try to do some research” to “I am a little nervous, there are a lot of people”, I had some great experiences that I would like to share. Once the research projects were proposed in our team, Danny, Rohith, and I opted for the recent advancements in Document AI. Danny, or Dr Siu, because he has a PhD in Behavioral Neuroscience, agreed to advise two guys with bachelor’s degrees to help them understand the basics of research.

Press enter or click to view image in full size

As it was the first time for all of us, the ride was expected to be a little bumpy. That’s where I learned the real-life application of an ML concept: that being part of an ensemble, rather than an individual, can smooth out the bumps and lead to more stable outcomes.

The Journey Begins

We started by defining the problem statement. While we knew we needed to catch up and experiment with the advancements in Document AI, finding a suitable dataset and models that can perform the task was the hard part.

We struggled to select between the SROIE and CORD datasets, but we finalised CORD due to its hierarchical annotation scheme. Once the dataset was finalised, we started looking for suitable models. We reviewed several research papers and identified two main contenders, LayoutLMV3 and SmolVLM, to evaluate against DONUT, which was considered SOTA.

The Evaluation Process

Now that the dataset and models have been selected, we only need to either fine-tune them or evaluate their out-of-the-box performance, easy peasy. That’s what I thought, but once I started working on the models, I realised it was not that straightforward.

All the models have their own differences; some cannot generate tokens, some can but are pretrained for generalised application, and some were made to solve a similar task, but pre-training is done on a different format of data, which makes it harder for the model to understand receipts.

While the above problems are related to the architectures and pre-training, fine-tuning can not be done using the same framework for all models, as the output and number of parameters vary.

After getting a deeper understanding of model architectures and while working on the evaluation, I improved my understanding of the pros and cons of using them. Rohith was also going through a similar process. Since Rohith had experience in training and fine-tuning the model, he was comfortable developing a framework to simplify these tasks, while I still struggled with vanilla code.

Rohith was able to find one more model, Pix2Struct, that had the potential to perform well at the problem at hand. We encountered some difficulties evaluating SmolVLM due to technical restrictions and a lack of time, so it was removed in the final draft of the poster.

While completing the evaluation, I searched for other models that can mitigate the problems with the current models. That’s when I discovered UDOP, which solved the limitation that LayoutLMV3 had for generating tokens, which became one of the main competitors of DONUT for the poster.

Finally, after some discussions with Danny, we decided to include some out-of-the-box evaluation for a recent VLM, Qwen2-VL-7B-Instruct, to estimate the potential for results without training.

Poster Creation

After we had all the evaluation results, it was time to put together a poster to present. Evaluations took longer than we estimated, which made it harder to squeeze the time to create a poster into our schedules. While we started basic layout setup and result accumulation, we needed to make good charts, tables, and finalise the contents.

Similar to my university days, we gathered in Rohith’s apartment on Sunday to complete the first draft. We divided the tasks of creating charts, finalising the contents for the future direction and conclusion, and started creating charts to show the results.

Some small things that made the day memorable were the walk to the supermarket to buy some watermelon and ice cream to deal with the summer in Tokyo, and an expensive-than-normal orange juice and pudding that Danny brought from a special store near his house.

The D-Day

Press enter or click to view image in full size

The first day was an NLP hackathon with a problem statement that covered general and recent LLM techniques. We were supposed to modify and refine a dataset using 3rd party LLMs to make it suitable for training a small in-house LLM. While not dealing with very advanced concepts, it gave me a deeper understanding of basic concepts like CoT(chain of thought) and LoRA. While coding with bachelor’s and master’s students, I was reminiscing about my college days. We managed to secure 3rd prize in the hackathon.

Press enter or click to view image in full size

The second day was my first experience watching people present their work. Presenters weren’t expected to share very complex research, as the conference is supposed to be for young people beginning their research careers, but I found some very interesting posters. At the end of the second day, we had a round table, which allowed us to interact with people working in the field. I met some interesting people there. One was leading the LLM utilisation at JAXA(Japan Aerospace Exploration Agency), while two others were working at LY Corporation, providing services integrating the NLP functionalities provided by Yahoo and Line.

Press enter or click to view image in full size

On the third and last day, we presented our poster. At the start, I was nervous as it was my first time presenting at a conference. But after presenting to some people, I started enjoying talking with people from different backgrounds, trying to solve similar problems. I had some fun interactions when people started noticing smaller details and asking questions about the choices we made. I also met some researchers who were working on a different problem but were trying to utilise the VLM, similar to our problem.

Press enter or click to view image in full size

Takeaway

  • The Power of the Ensemble: A team with diverse skills can overcome individual struggles and “smooth out the bumps” to achieve a better outcome.
  • Research is an Adaptive Process: The plan was not rigid. We struggled with datasets (SROIE vs. CORD), had to drop a model due to technical issues (SmolVLM), and discovered better alternatives along the way (UDOP, Pix2Struct). This highlights that good research involves flexibility and the ability to pivot when faced with obstacles.
  • Sharing Your Work is a Reward in Itself: The most fulfilling part of the research cycle was interacting with others. Engaging in detailed discussions, answering questions about specific choices, and connecting with people working on similar problems validates the hard work and opens doors for future collaboration.

Poster

Press enter or click to view image in full size

You May Also Like