Spring 2024 Fellow Reflections
Misbah Imtiaz is a recent graduate of the University of Texas at Austin with a bachelor's in Computer Science. He served as the Tribune’s Spring 2024 Engineering & AI/ML Fellow. Learn more about Misbah here.
What was your path to the Tribune? Why did you want to apply?
It was my last semester of college, and as I looked at my course schedule, I realized that I only had two classes left to finish my degree. While I could have just tossed in the towel and basked in the sun as I coasted through my final classes, something stood out when I was scrolling on Handshake: Engineering Fellow at The Texas Tribune. Woah. A news media company with an engineering team? What does that even look like? What types of problems are they solving? As a former debater with an interest in public policy and research, coupled with a love for development and code-writing, my curiosity and passion got the best of me and I knew I had to apply. And yes, I know, why work when you can enjoy your senior year? Ever since freshman year, I’ve always pushed myself to seek opportunities for growth as a developer through projects and internships. I was yearning for the chance to engage in something meaningful during my final semester, and The Texas Tribune seemed like a perfect fit. The idea of solving interesting engineering challenges at the intersection of journalism and machine learning on a smaller engineering team with more scope and responsibility seemed too good to be true. Luckily for me, the truth held up, and I embarked on an incredible journey as an Engineering Fellow at The Texas Tribune.
What did you do during your fellowship? What have you learned?
At the start of the internship, I was tasked with some introductory objectives to gain familiarity with Python, Docker, and SQL within their codebase. With the help of Suraj, I learned a plethora of new Docker commands and can now whip up some containers with a Dockerfile and some YAML. My first task was to identify, process, and clean ‘explainer’ articles for a Texas-specific chatbot. It was super cool to leverage the techniques I learned in class to create scripts using Pandas and querying with SQL to ingest over 40,000 published articles and curate a smaller dataset for training purposes.
My second major task was to figure out a solution to transcribing audio from Tribune events into text with labeled speakers. I researched many different open-source models for audio-to-text transcription and settled on a variation of OpenAI’s neural net called WhisperX. It seemed perfect as it provided not only the ability to transcribe long audio formats but also had integration with Pyannotate, which is used for labeling different speakers, otherwise known as speaker diarization. I created a Jupyter notebook and documented the process of implementing a pipeline for transforming audio into labeled text based on different speakers. While the end result produced a complete transcript of shorter audio content, issues with balancing GPU credits and accuracy for longer-form content made other vendors a lot more suitable than building out an approach from scratch.
Last but not least, the largest project I worked on during this fellowship was the research and development of a text-to-audio player. Initially, the goal was to research different vendors built on top of text-to-speech models and choose the most suitable one. After a series of meetings and documentation, Sydney and I narrowed down the list to Ad Auris. I then worked on creating an RSS feed in Django to allow for the integration of the audio player into the CMS. However, after a series of concerns relating to customizability, the opportunity to develop a player and pipeline from scratch arose, and I quickly jumped on it. I created a Python script leveraging the API of ElevenLabs to generate audio from text. The next step was to develop a monitoring system on top of an RSS feed to generate audio for new articles detected. Leveraging AWS S3 storage, Docker, and cron scheduling with GitHub trigger workflows, I was able to produce a complete pipeline for generating audio files for articles and retrieving them through plugins on the CMS.
While there were many minor tasks sprinkled in between these major tasks are the takeaways of what I learned throughout this fellowship.
What was the most surprising part of the fellowship?
The most surprising part of the fellowship was the showcase of development power from such a small engineering team. From the creation and maintenance of an in-house content management system to the constant scoping and planning of new initiatives advancing journalism through the power of software, it was inspiring to see the different projects coming to fruition from the dedication and skill of Ashley, Suraj, Jonathan, and Mathew. There was always something on their plate, and there was no stopping the visions they had for a new direction in making the Texas Tribune a leader in digital journalism. It was an honor to work with such great developers!
What is your favorite memory from the fellowship?
While I can’t pinpoint one specific memory of the fellowship as my favorite, I really enjoyed all the one-on-one meetings I had with Ashley and Darla. From discussing promising trends with generative AI in journalism to receiving general advice on how to be a successful entry-level engineer, their mentorship, advice, and guidance were invaluable. Our discussions not only helped me grow as a developer but also provided me with insights into the industry that I wouldn't have gained otherwise. Whether we were troubleshooting code, brainstorming new ideas, or just sharing our experiences, these interactions were hugely beneficial and made a lasting impact on my professional journey.
What is your advice for anyone applying?
Don’t be shy and make the best of the fellowship by being engaged and actively seeking out new engineering tasks. At the same time, make sure when you're stuck that you ask for help, as the engineering team here is super open and responsive. You will have a blast and best of luck!