Introduction to Natural Language Processing (NLP) in Data Science

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and linguistics that focuses on the interaction between computers and human languages. It encompasses a range of techniques aimed at enabling computers to understand, interpret, and generate human language in a manner that is both meaningful and contextually relevant. 

In data science, NLP plays a pivotal role in extracting insights from vast amounts of textual data. Through techniques such as text classification, sentiment analysis, named entity recognition, and language translation, NLP empowers data scientists to analyze and derive actionable insights from unstructured text data sources such as social media, customer reviews, emails, and news articles. By harnessing the power of NLP, data scientists can uncover patterns, trends, and sentiments within textual data. This enables organizations to make data-driven decisions and enhance various aspects of their operations, from customer service to product development and market analysis.

NLP is fundamental to generative AI models like ChatGPT. Natural language processing techniques enable these models to understand and generate human-like text, making them capable of engaging in meaningful conversations with users. NLP provides the framework for tasks such as language understanding, sentiment analysis, summarization, and language generation. All are essential components of generative AI systems.

Applications of NLP

NLP techniques are extensively utilized in text classification and sentiment analysis, offering a wide array of applications across various industries.

Text Classification

NLP enables automatic categorization of textual data into predefined classes or categories. Applications include:

  • Spam detection: NLP algorithms can classify emails or messages as spam or non-spam, helping users manage their inbox efficiently.
  • Topic classification: NLP models categorize news articles, research papers, or social media posts into relevant topics, aiding in content organization and information retrieval.
  • Language identification: NLP models can identify the language of a given text, which is useful for multilingual platforms and content analysis.

Sentiment Analysis

NLP techniques are employed to analyze the sentiment or emotion expressed in textual data, providing valuable insights for decision-making. Applications include:

  • Brand monitoring: Sentiment analysis helps businesses monitor online conversations about their brand, products, or services. This enable them to gauge public perception and address potential issues promptly.
  • Customer feedback analysis: NLP algorithms analyze customer reviews, surveys, and social media comments. Then, they work to understand customer sentiment towards specific products or services, facilitating product improvement and customer satisfaction.
  • Market research: Sentiment analysis aids in analyzing public opinion and sentiment towards specific topics or events, providing valuable insights for market research, trend analysis, and forecasting.
  • Social media analysis: NLP techniques are utilized to analyze sentiment in social media posts, tweets, and comments, enabling businesses to track customer sentiment, identify influencers, and engage with their audience effectively.

NLP Techniques

NLP encompasses a variety of techniques designed to enable computers to understand and process human languages. Two fundamental techniques in NLP are tokenization and stemming, which play crucial roles in text preprocessing and analysis.


Tokenization is the process of breaking down a piece of text into smaller units (called tokens). These tokens can be words, phrases, or other meaningful elements. The primary goal of tokenization is to divide the text into individual units for further analysis. There are different tokenization strategies, including:

  • Word tokenization divides the text into words or word-like units. For example, the sentence “The quick brown fox jumps over the lazy dog” would be tokenized into [“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”].
  • Sentence tokenization splits the text into sentences. For instance, the paragraph “Natural Language Processing (NLP) is a fascinating field. It involves analyzing and understanding human language” would be tokenized into [“Natural Language Processing (NLP) is a fascinating field.”, “It involves analyzing and understanding human language.”].


Stemming is the process of reducing words to their root or base form, known as the stem. The goal of stemming is to normalize words so that different forms of the same word are treated as identical. Stemming algorithms apply heuristic rules to remove suffixes and prefixes from words. For example:

  • Original Word: “Running”
  • Stemmed Word: “Run”
  • Original Word: “Jumped”
  • Stemmed Word: “Jump”

Stemming is particularly useful in tasks such as text mining, information retrieval, and search engines. Why? Because stemming reduces words to their base forms can improve indexing and retrieval accuracy. Both tokenization and stemming are essential preprocessing steps in many NLP applications, including text classification, sentiment analysis, machine translation, and information retrieval. These techniques help transform raw textual data into a format suitable for further analysis and modeling, facilitating the extraction of meaningful insights from large volumes of text data.

Natural Language Processing (NLP) Resources

Given the comprehensive overview of NLP’s applications and techniques, several resources can significantly aid in deepening your understanding and skills in this field. Books such as Natural Language Processing in Action by Lane, Howard, and Hapke, and Speech and Language Processing by Jurafsky and Martin provide foundational knowledge and practical examples. These texts are excellent for understanding the underlying principles of NLP. They’re also great for reference on specific topics like tokenization, stemming, and machine learning models used in NLP. Regardless of which NLP resource is used, the key is to practice coding the models.

Learn More about NLP in Flatiron’s Data Science Bootcamp

Flatiron School’s Data Science Bootcamp teaches natural language processing, data analysis and engineering, machine learning fundamentals and much more. Full-time and part-time enrollment opportunities await! Apply today or schedule a call with Admissions to learn more about what Flatiron can do for you and your career. 

Enhancing Your Tech Career with Remote Collaboration Skills

Landing a career in the tech industry requires more than just technical/hard skills; it requires soft skills like effective communication, adaptability, time management, problem-solving abilities, and remote collaboration skills. Remote collaboration is especially key for those who work in tech; according to U.S. News & World Report, the tech industry leads all other industries with the highest percentage of remote workers.

At Flatiron School, we understand the importance of these skills in shaping successful tech professionals. Hackonomics, our AI-focused hackathon event happening between March 8 and March 25, will see participants sharpen remote collaboration skills (and many others) through the remote team-based building of an AI-driven personal finance platform. We’ll reveal more about Hackonomics later in the article; right now, let’s dive deeper into why remote collaboration skills are so important in today’s work world.

Mastering Remote Collaboration Skills

Remote collaboration skills are invaluable in today’s digital workplace, where teams are often distributed across different locations and time zones. Whether you’re working on a project with colleagues halfway across the globe or collaborating with clients remotely, the ability to effectively communicate, problem-solve, and coordinate tasks in a remote work setting is essential for success. Here are some other key reasons why this skill is becoming so important. 

Enhanced Productivity and Efficiency

Remote collaboration tools and technologies empower teams to communicate, coordinate, and collaborate in real-time, leading to increased productivity and efficiency. With the right skills and tools in place, tasks can be completed more quickly, projects can progress smoothly, and goals can be achieved with greater ease.

Flexibility and Work-life Balance

Remote work offers unparalleled flexibility, allowing individuals to balance their professional and personal lives more effectively. However, this flexibility comes with the responsibility of being able to collaborate effectively from anywhere, ensuring that work gets done regardless of physical location.

Professional Development and Learning Opportunities

Embracing remote collaboration opens doors to a wealth of professional development and learning opportunities. From mastering new collaboration tools to honing communication and teamwork skills in virtual settings, individuals can continually grow and adapt to the evolving demands of the digital workplace.

Resilience in the Face of Challenges

Events such as the COVID-19 pandemic—and the massive shift to at-home work it caused—has highlighted the importance of remote collaboration skills. When faced with unforeseen challenges or disruptions, the ability to collaborate remotely ensures business continuity and resilience, enabling teams to adapt and thrive in any environment.

Join Us for the Hackonomics Project Showcase and Awards Ceremony

Come see the final projects born out of our Hackonomics teams’ remote collaboration experiences when our Hackonomics 2024 Showcase and Awards Ceremony happens online on March 28. The event is free to the public and offers those interested in attending a Flatiron School bootcamp a great opportunity to see the types of projects they could work on should they enroll.

The 8 Things People Want Most from an AI Personal Finance Platform

Great product design is one of those things you just know when you see it, and more importantly—use it. It’s not just about being eye-catching; it’s about serving a real purpose and solving a real problem—bonus points if you can solve that problem in a clever way. If there ever was a time to build a fintech app, that time is now. The market is ripe, the problems to solve are plenty, and the tools and resources are readily available. Flatiron School Alumni from our Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps have been tasked to help me craft Money Magnet, an AI personal finance platform that solves common budget-making challenges. They’ll tackle this work during Hackonomics, our two-week-long hackathon that runs from March 8 to March 25.

There is one goal in mind: to help individuals and families improve their financial well-being through an AI financial tool.

A loading screen mockup for AI personal finance platform Money Magnet
A loading screen mockup for AI personal finance platform Money Magnet

My Personal Spreadsheet Struggle

The concept for Money Magnet sprang from personal frustration and mock research around user preferences in AI finance. As a designer, I often joke, “I went to design school to avoid math.” Yet, ironically, I’m actually quite adept with numbers. Give me a spreadsheet and 30 minutes, and I’ll show you some of the coolest formulas, conditional formats, and data visualization charts you’ve ever seen.

Despite this, in my household, the responsibility of budget management falls squarely to my partner. I prefer to stay blissfully unaware of our financial details—knowing too much about our funds admittedly tends to lead to impulsive spending on my part. However, occasionally I need to access the budget, whether it’s to update it for an unexpected expense or to analyze historical data for better spending decisions.

We’re big on goal-setting in our family—once we set a goal, we stick to it. We have several future purchases we’re planning for, like a house down payment, a new car, a vacation, and maybe even planning for children. 

But here’s the catch: None of the top AI financial tools on the market incorporate the personal finance AI features that Money Magnet proposes bringing to the market. Families need an AI personal finance platform that looks into our spending patterns from the past and projects into the future to tell users when the budget gets tighter. This product should be easy to use with access to all family members to make changes without fear of wrecking the budget.

For more context, each year, my partner forecasts a detailed budget for us. We know some expenses fluctuate—a grocery trip might cost $100 one time and $150 the next. We use averages from the past year to estimate and project those variable expenses. This way, we manage to live comfortably without having to scale back in tighter months, fitting in bigger purchases when possible, and working towards an annual savings goal.

Top financial apps chart from Sensor Tower
Top financial apps chart from Sensor Tower

But here’s where the challenge lies: My partner, as incredible as he is, is not a visualist. He can navigate a sea of spreadsheet cells effortlessly, which is something I struggle with (especially when it comes to budgeting). I need a big picture, ideally represented in a neat, visual chart or graph that clearly illustrates our financial forecast.

Then there’s the issue of access and updates. Trying to maneuver a spreadsheet on your phone in the middle of a grocery store is far from convenient. And if you make an unplanned purchase, updating the sheet without disrupting the formulas can be a real hassle, especially on a phone. This frustration made me think, “There has to be a better solution!”

Imagining the Ultimate AI Personal Finance Platform

Imagine an AI personal finance platform that “automagically” forecasts the future, securely connects to your bank and credit cards to pull transaction histories, and creates a budget considering dynamic and bucketed savings goals. This dream app would translate data into a clear dashboard, visually reporting on aspects like spending categories, monthly trends in macro and micro levels, amounts paid to interest, debt consolidation plans, and more.

It’s taken eight years of experiencing my partner’s budget management to truly understand a common struggle that many other families in the U.S. face: Advanced spreadsheet functions, essential in accounting and budgeting, are alien to roughly 73% of U.S. workers.

The extent of digital skills in the U.S. workforce according to OECD PIAAC survey data. Image Source: Information Technology and Innovation Foundation
The extent of digital skills in the U.S. workforce according to OECD PIAAC survey data. Image Source: Information Technology and Innovation Foundation

Money Magnet aims to automate 90% of the budgeting process by leveraging AI recommendations about users’ personal finances to solve eight of the key findings outlined in a mock research study based on some of the challenges I had faced when developing a budget of my own.

Features to Simplify Your Finances

This dream budgeting tool is inspired by my own financial journey and the collective wish list of what an ideal personal finance assistant should be. Here’s a snapshot of the personal finance AI features that aims to position Money Magnet as one of the top AI financial tools on the market:

  1. Effortless Onboarding: Starting a financial journey shouldn’t be daunting. Money Magnet envisions a platform where setting up accounts and syncing banking information is as quick and effortless as logging into the app, connecting your bank accounts, and establishing some savings goals (if applicable).
  2. Unified Account Dashboard: Juggling multiple banking apps and credit card sites can be a circus act, trying to merge those separate ecosystems as a consumer is nearly impossible. Money Magnet proposes a unified dashboard, a one-stop financial overview that could declutter your digital financial life.
  3. Personalized AI Insights: Imagine a platform that knows your spending habits better than you do, offering bespoke guidance to fine-tune your budget. Money Magnet aims to be that savvy financial companion, using AI to tailor its advice just for you.
  4. Vivid Data Visualization: For those of us who see a blur of numbers on statements and spreadsheets, Money Magnet could paint a clearer picture with vibrant graphs and charts—turning the abstract into an understandable, perceivable, engaging, and dynamic visual that encourages you to monitor the trends.
  5. Impenetrable Security: When dealing with informational and financial details, security is non-negotiable. Money Magnet will prioritize protecting your financial data with robust encryption and authentication protocols, so your finances are as secure as Fort Knox.
  6. Intelligent Budget Optimization and Forecasting: No more cookie-cutter budget plans that force your spending to fit conventional categorization molds! Money Magnet will learn user preferences in AI finance and forecast from your historic spending, suggesting ways to cut back on lattes or add to your savings—all personalized to improve your financial well-being based on your real-world spending and forecast into the future to avoid pinch-points.
  7. Smooth Bank Integrations: Another goal of Money Magnet is to eliminate the all-too-common bank connection hiccups where smaller banks and credit unions don’t get as much connectivity as the larger banks, ensuring a seamless link between your financial institutions and the app.
  8. Family Financial Management: Lastly, Money Magnet should be a tool where managing family finances is a breeze. Money Magnet could allow for individual family profiles, making it easier to teach kids about money and collaborate on budgeting without stepping on each other’s digital toes or overwriting a budget. It’s important for those using Money Magnet to know it can’t be messed up, and that any action can always be reverted.

See the Money Magnet Final Projects During Our Closing Ceremony on March 28

Attend the Hackonomics 2024 Showcase and Awards Ceremony on March 28 and see how our participating hackathon teams turned these eight pillars of financial management into a reality through their Money Magnet projects. The event is online, free of charge, and open to the public. Hope to see you there!

Decoding Data Jargon: Your Key to Understanding Data Science Terms

There are a myriad of data science terms used in the data science field, where statistics and artificial intelligence are employed to discover actionable insights from data. For example, data science can be used by banks for fraud detection, or for content recommendations from streaming services. This post focuses on some of the key terms from statistics that are commonly used within data science and then concludes with a few remarks on using data science terminology correctly.

Definitions of Key Data Science Terms

Let’s look at some of the key terms in data science that you need to have a grasp on.

Numeric and Categorical Data

Data can be either numeric (or quantitative) or categorical (or qualitative). Numeric data represents quantities or amounts. Categorical data represents attributes that can be used to group or label individual items. If a student is a first-generation college student who is taking 17 semester units, then the student’s educational generation is categorical and the number of units is numeric.

Types of Statistics

When one is introduced to the use of statistics in data science, terms generally fall within one of the two main branches of statistics that serve different purposes in the analysis of data: descriptive statistics and inferential statistics

Descriptive statistics summarize and organize characteristics of a data set. They give a snapshot of the data through numbers, charts, and graphs without making conclusions beyond the data analyzed or making predictions.

Descriptive Statistics: Mean, Median, Mode, Standard Deviation, and Correlation

Measures of central tendency provide a central point around which the data is distributed and measures of variability describe the spread of the data. The two most common measures of central tendency for numeric data are the mean and the median. The most common measure of central tendency for categorical data is the mode. The mean in data science is the average value (sum all of the values and divide by the number of observations). The median in data science is the middle value, and the mode is the most common value. 

Note that while the mode is generally used for categorical data, numeric data can also have modes. Consider the following made-up data set that is listed in order for simplicity: 2, 3, 7, 9, 9. The mode is 9 since it is the only value that shows up more than once. The median is 7 since it is precisely the middle value, and the mean is 30/5 = 6. The most used measure of variability is the standard deviation, which can be thought of as the average distance that each observation is from the mean. In the toy example noted above, the standard deviation is 3.31. So on average, each number of the data set is 3.31 away from the mean.

All of the aforementioned descriptive statistics are for univariate data (i.e., data with only one variable). More often in data science, we look at data that is multivariate. For instance, one could have two variables—the height and weight of NBA players. A descriptive statistic that describes the relationship between these variables is called the correlation. The correlation is a value between -1 and 1 and represents the strength and direction of the relationship.

Inferential Statistics: Confidence Intervals and Hypothesis Tests

Now let’s turn to some key terms from inferential statistics that are used in data science. There are two main types of inferential statistics: confidence intervals and hypothesis tests. Confidence intervals give an estimate of an unknown population value. Hypothesis tests determine if a data set is significantly different from an assumed value regarding the population at a certain level of confidence. 

For example, a confidence interval that is estimating the average (mean) height of NBA players in inches could be (75 inches, 81 inches). Whereas for a hypothesis test we can claim that the average height of NBA players is 78 inches and then test to see if our data differs substantially from that value. If our data set has a sample mean of 74 inches, then it is likely that this shows statistical significance because our mean is so different from the assumed population mean of 78 inches. While if our data set has a sample mean of 77 inches, then it is unlikely that this will show statistical significance since our sample mean and the assumed population mean are close.

For a much more technical overview of statistical significance, confidence intervals, and hypothesis testing, please see our post “Rejecting the Null Hypothesis Using Confidence Intervals.”

How to Use Data Science Terms Wisely

Time now for an anecdote. A friend of mine—let’s call him Yinzer—was giving a presentation to his boss. He was tasked with presenting descriptive statistics on the company’s data. He included in his presentation a descriptive statistic called the kurtosis since that value was produced by the software. Yinzer’s boss asked him, “What is kurtosis?” Yinzer didn’t know and was unable to answer the question. 

The moral of the story is: only use data science terms such as those that we have discussed like mean, median, standard deviation, correlation, and hypothesis testing if you are confident in being able to explain them.

Some Additional Tips for Using Data Science Terminology

Here are some additional tips for using data science terminology if you are a beginner in the field:

Focus on understanding, not memorizing: Don’t try to memorize every term you encounter. Instead, focus on grasping the underlying concepts and how they relate to each other. This will allow you to learn new terms organically as you progress.

Practice with real data: The best way to solidify your understanding is to apply it. Find beginner-friendly datasets online and use them to practice basic data cleaning, analysis, and visualization. This will expose you to terminology in a practical setting.

Engage with the data science community: Join online forums, attend meetups, or connect with other data science beginners. Discussing concepts and terminology with others can solidify your understanding and expose you to new terms in a collaborative environment.

Learn Data Science at Flatiron in 15 Weeks

Full-time students in Flatiron’s Data Science Bootcamp can graduate in under four months with the skills needed to land data analyst, AI engineer, and data scientist jobs. Book a 10-minute call with our Admissions team to learn more.

How to Achieve Portfolio Optimization With AI

Here’s a fact: Employers are seeking candidates with hands-on experience and expertise in emerging technologies. Portfolio optimization using Artificial Intelligence (AI) has become a key strategy for people looking to break into the tech industry. Let’s look at some of the advantages of having an AI project in a portfolio, and how portfolio optimization with AI can be a possible game changer in regards to getting your foot in the door at a company.

The Pros of Having AI Projects in a Portfolio

For people seeking to transition into the tech industry, having AI projects in their portfolios can be a game-changer when it comes to landing coveted roles and advancing their careers. By showcasing hands-on experience with AI technologies and their applications in real-world projects, candidates can demonstrate their readiness to tackle complex challenges and drive innovation in any industry. Employers value candidates who can leverage AI to solve problems, optimize processes, and deliver tangible results, making AI projects a valuable asset for aspiring tech professionals.

Achieving portfolio optimization with AI by integrating AI into portfolios is quickly becoming a cornerstone of success for tech job seekers. However, portfolio optimization with AI involves more than just adopting the latest technology. It requires a strategic business approach and a deep understanding of Artificial Intelligence. Below are details about Hackonomics, Flatiron School’s AI-powered budgeting hackathon

The Components of Flatiron’s AI Financial Platform Hackathon

Identifying the Right Business Problem

The Hackonomics project revolves around cross-functional teams of recent Flatiron graduates building an AI-driven financial platform to increase financial literacy and provide individualized financial budgeting recommendations for customers. Identifying the right business problem entails understanding the unique needs and challenges of a target audience, ensuring that a solution addresses critical pain points and that the utilization of AI delivers tangible value to users.      

AI Models

At the core of Hackonomics are machine learning models meticulously designed to analyze vast amounts of financial data. These models will enable the uncovering of valuable insights into user spending patterns, income sources, and financial goals, laying the foundation for personalized recommendations and budgeting strategies.

Software and Product Development

As students develop their Hackonomics projects, continuous product development and fine-tuning are essential for optimizing performance and usability. This involves iterating on platform features (including UI design and SE functionality) and refining AI algorithms to ensure that the platform meets the evolving needs of users and delivers a seamless and intuitive experience.

Security and Encryption

Ensuring the security and privacy of users’ financial data is paramount. The Hackonomics project incorporates robust security measures, including encryption techniques, to safeguard sensitive information from outside banking accounts that need to be fed into the platform. Additionally, multi-factor authentication (MFA) adds an extra layer of protection, mitigating the risk of unauthorized access and enhancing the overall security posture of our platform.

Join Us at the Hackonomics Project Showcase on March 28

From March 8 to March 25, graduates of Flatiron School’s Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps will collaborate to develop fully functioning AI financial platforms that analyze user data, provide personalized recommendations, and empower individuals to take control of their financial futures.

The Hackonomics outcomes are bound to be remarkable. Participants will create a valuable addition to their AI-optimized project portfolios and gain invaluable experience and skills that they can showcase in job interviews and beyond.

The judging of the projects will take place from March 26 to 27, followed by the showcase and awards ceremony on March 28. This event is free of charge and open to prospective Flatiron School students, employers, and the general public. Reserve your spot today at the Hackonomics 2024 Showcase and Awards Ceremony and don’t miss this opportunity to witness firsthand the innovative solutions that emerge from the intersection of AI and finance. 

Unveiling Hackonomics, Flatiron’s AI-Powered Budgeting Hackathon

Are you interested in learning about how software engineering, data science, product design, and cybersecurity can be combined to solve personal finance problems? Look no further, because Flatiron’s AI-powered budgeting hackathon—Hackonomics—is here to ignite your curiosity.

This post will guide you through our Hackonomics event and the problems its final projects aim to solve. Buckle up and get ready to learn how we’ll revolutionize personal finance with the power of AI.

Source: Generated by Canva and Angelica Spratley
Source: Generated by Canva and Angelica Spratley

Unveiling the Challenge

Picture this: a diverse cohort of recent Flatiron bootcamp graduates coming together on teams to tackle an issue that perplexes and frustrates a huge swath of the population—personal budgeting.

Hackonomics participants will be tasked with building a financial planning application named Money Magnet. What must Money Magnet do? Utilize AI to analyze spending patterns, income sources, and financial goals across family or individual bank accounts.

The goal? To provide personalized recommendations for optimizing budgets, identifying potential savings, and achieving financial goals through a dynamic platform that contains a user-friendly design with interactive dashboards, a personalized recommendation system to achieve budget goals, API integration of all financial accounts, data encryption to protect financial data, and more.

The Impact of AI in Personal Finance

Let’s dive a little deeper into what this entails. Integrating AI into personal finance isn’t just about creating fancy algorithms; it’s about transforming lives through the improvement of financial management. Imagine a single parent struggling to make ends meet, unsure of where their hard-earned money is going each month. With AI-powered budgeting, they can gain insights into their spending habits, receive tailored recommendations on how to save more effectively, and ultimately, regain control of their financial future. It’s about democratizing financial literacy and empowering individuals from all walks of life to make informed decisions about their money.

Crafting an Intuitive Technical Solution Through Collaboration

As the teams embark on this journey, the significance of Hackonomics becomes abundantly clear. It’s not just about building an advanced budgeting product. It’s about building a solution that has the power to vastly improve the financial health and wealth of many. By harnessing the collective talents of graduates from Flatiron School’s Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps, Hackonomics has the opportunity to make a tangible impact on people’s lives.

Let’s now discuss the technical aspects of this endeavor. The platforms must be intuitive, user-friendly, and accessible to individuals with varying levels of financial literacy. They also need to be up and running with personalized suggestions in minutes, not hours, ensuring that anyone can easily navigate and understand their financial situation. 

Source: Generated by Canva and Angelica Spratley
Source: Generated by Canva and Angelica Spratley

Embracing the Challenge of Hackonomics

Let’s not lose sight of the bigger picture. Yes, the teams are participating to build a groundbreaking platform, but they’re also participating to inspire change. Change in the way we think about personal finance, change in the way we leverage technology for social good, and change in the way we empower individuals to take control of their financial destinies.

For those participating in Hackonomics, it’s not just about building a cool project. It’s about honing skills, showcasing talents, and positioning themselves for future opportunities. As participants develop their AI-powered budgeting platforms, they’ll demonstrate technical prowess, creativity, collaborative skills, and problem-solving abilities. In the end, they’ll enhance their portfolios with AI projects, bettering their chances of standing out to potential employers. By seizing this opportunity, they’ll not only revolutionize personal finance but also propel their careers forward.

Attend the Hackonomics Project Showcase and Awards Ceremony Online

Participation in Hackonomics is exclusively for Flatiron graduates. Participants will build their projects from March 8 through March 25. Winners will be announced during our project showcase and awards ceremony closing event on March 28.

If you’re interested in attending the showcase and ceremony on March 28, RSVP for free through our Eventbrite page Hackonomics 2024 Showcase and Awards Ceremony. This is a great opportunity for prospective students to see the types of projects they can work on should they decide to apply to one of Flatiron’s bootcamp programs.

The Data on Barbie, Greta Gerwig, and Best Director Snubs at the Oscars

When the 2024 Academy Award nominees were announced in late January, one of the most hotly discussed topics was that Greta Gerwig, director of Barbie, was not nominated for Best Director, despite the film being nominated for Best Picture. I assumed a Best Director nomination went hand-in-hand with a Best Picture nomination, so how common is it for a film to be nominated for Best Picture, but not Best Director? It turns out, fairly often, at least since 2009.

50 years of Best Picture and Best Director Oscar nominations
The chart above comes from Flatiron’s analysis of over 50 years of Best Picture and Best Director Oscar nominations. Films that win these two awards are often nominated in both categories.

From 1970 to 2008, the Best Picture and Best Director categories had five nominees each. It was common to see four of the five Best Picture nominees also receiving a nomination for Best Director. And in 32 of these 39 years, the film that won Best Picture also won Best Director.

In 2009, the Best Picture nomination limit increased to 10 films. Best Director remained capped at five, so naturally, this resulted in more Best Director snubs than before. In terms of winners, the larger pool of Best Picture nominees seems to be aiding in separating the two awards. Best Picture and Best Director Oscars have gone to two different films in six of the last 14 years (this happened only seven times in the 39 years before 2009).


Although it’s no longer uncommon for a film to receive a Best Picture nomination without one for Best DIrector, Barbie wasn’t just any film. Barbie was one half of the cultural phenomenon known as Barbenheimer. A mashup of two highly anticipated and starkly different films—Barbie, and director Christopher Nolan’s historical biopic Oppenheimer—both hit theaters on July 21, 2023. The goal of seeing both films back-to-back became one of the defining characteristics of the Barbenheimer phenomenon. While both films were hugely successful at the domestic and international box office, Barbie out-grossed Oppenheimer by an estimated half-billion dollars worldwide.

The two films dominated the zeitgeist for much of 2023 and both received enormous critical acclaim. Oppenheimer has dominated this awards season, however, with 13 Oscar nominations garnered and multiple important wins at other film awards ceremonies leading up to the Academy Awards on March 10.

We’ll return to how we think about “importance” in the context of nominations, but for now, let’s compare the two films along the lines of major award ceremonies, ratings, and box office revenue.

Barbie vs Oppenheimer

analysis comparing Barbie and Oppenheimer performance by major awards
The graphic above comes from our analysis comparing Barbie and Oppenheimer. Both films have numerous award nominations and have brought in over two billion dollars combined.

Minus its take at the People’s Choice Awards, Oppenheimer has taken home more awards overall, despite having a similar number of nominations at most award shows. Barbie appeared to be on a roll this award season, with nominations for picture, director, screenplay, actress, and supporting actor at the Golden Globes and Critics Choice Awards in early January. However, Greta Gerwig was left out of the director category when the Oscar nominees were announced on January 23. This leads to the question, what films are most similar to Barbie, not just by nomination count, but across major categories? And were those films nominated for Best Director?

Movies Like Barbie

We began our Best Director snubs analysis at Flatiron by collecting all past nominees across the entire history of the awards ceremonies noted in the image above—swapping out the People’s Choice Awards for the Writers Guild Awards—for a comprehensive dataset of non-fan nominations. We also merged categories like Best Adapted Screenplay and Best Original Screenplay into one screenplay category for ease of comparison. Similarly, we lumped all acting categories–male, female, lead, and supporting–into one, and all Best Picture categories into one if split into drama and comedy/musical categories (like the Golden Globes does).

With a dataset of over 3,000 nominees going back to the 1920s, we found films most similar to Barbie across our grouped screenplay, grouped actor(s), director, and picture categories using Euclidean distance, a method for finding the distance between two data points. The five films below are the most similar to Barbie according to the awards and groupings we’ve selected. Interestingly, these five films, including Gerwig’s 2017 debut film, Lady Bird, all received a Best Director nomination at the Oscars (while Gerwig’s directing work on Barbie did not).

comparing barbie's nominations to other high-performing movies from previous award seasons

Predicting Best Director Snubs at the Oscars

A sample size of five is certainly not enough evidence to make a definitive claim of a snub, so we developed a predictive model that classifies a film as a Best Director nominee based on the other nominations it received, either at the Oscars or previous award shows. Our final model achieved 91% accuracy. For the astute reader, it also reached 93% precision and 96% recall. 

Based on films from 1927 to 2022, the best predictor of a Best Director nomination at the Oscars is a Best Picture nomination at the Oscars. This isn’t surprising, considering the overlap in nominees that we observed in the first image at the top of the article.

Other notable predictors are Best Screenplay at the Oscars or Critics Choice Awards, and Best Director at the Golden Globes or Director’s Guild Awards (DGA). These predictors align with intuition, given the importance of a good script and how common it is to have a filmmaker with the title of writer/director. In the case of the DGA, it’s hard to think of a more qualified group to identify the best directors of the year than the 19,000-plus directors who make up the guild’s membership 

Trained Model Predictions

Finally, using our trained model, we applied it to our list of 2023 films that received at least one nomination in a screenplay, acting, directing, or picture category. Given the long list of accolades received by Barbie at the Golden Globes, Critics Choice Awards, British Academy Film Awards (BAFTA), and all the filmmaking guild awards, our model predicted Greta Gerwig to have a 76% chance of snagging a Best Director nomination. Considering she was in third, just behind Christopher Nolan for Oppenheimer and Yorgos Lanthimos for Poor Things, I’d call this a snub. (Gerwig tied for third with Justine Triet for Anatomy of a Fall.)

which best director nominations were predicted by a trained model

Best Director Snubs and Flatiron’s Analysis

Rank-ordering the predicted probability of receiving the directorial nomination, the 2017 film Three Billboards Outside Ebbing, Missouri by writer/director Martin McDonagh was our model’s biggest snub. A film that initially received wide acclaim, it later faced criticism over its portrayal of misogyny and racism. Coincidentally, Greta Gerwig was one of the five directors nominees that year alongside Guillermo del Toro, Christopher Nolan, Jordan Peele, and Paul Thomas Anderson—a star-studded list of filmmakers if ever there was one. 

the biggest "best director" snubs over the last 25 years
The table above shows where our model was highly confident—but ultimately, incorrect—that a film would receive the Best Director nod.

It’s worth noting that many of the films listed in our table above also appear in a recent Variety article that ranked the biggest Best Director snubs over the last 25 years. While the writer of the Variety article does not discuss his methodology, it’s always a good idea in data science to validate findings with subject matter experts. In the case of our analysis and the Variety article analysis, there seems to be some agreement. 

Final Thoughts

As with all predictive models, our model is only as good as the data it learns from. A common criticism of the Academy is its lack of nominating women and people of color across categories, particularly for Best Director. Mitigating bias and ensuring fairness in predictive models are important concepts in Big Data Ethics, but we’ll save the ways one could address these issues for another post.

Learn Data Science at Flatiron School

Data analyst is just one of the career paths you can embark on after graduating from Flatiron’s Data Science Bootcamp. Our bootcamp offers students the opportunity to graduate and begin working in the field in as little as 15 weeks. Download the course syllabus for free to see what you can learn!

Header photo courtesy of Warner Bros. Pictures

Using Scikit-Learn for Machine Learning in Python

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Given that Python is the most widely used language in data science and taught in Flatiron’s Data Science Bootcamp, we’ll begin by describing what the aforementioned terms mean before turning to the topic of using scikit-learn.

An interpreted language is one that is more flexible than a compiled language like C since it directly executes instructions written within the language.

An object-oriented language is one that is designed around data or objects.

A high-level programming language is one that can be easily understood by humans since its syntax reflects human usage. 

Finally, dynamic semantics is a framework that allows the meaning of a term to be updated based on context. All of these attributes of Python make it work well within data science since it is a flexible, easy-to-read language that works with data well.

Python’s Libraries

Python’s power is expanded by the use of libraries. A Python library is a collection of related modules that allow one to perform common tasks without having to create the functions for the tasks anew. Two libraries that are inevitably used when working with Python in data science are NumPy and pandas. The former allows one to efficiently deal with large matrices and perform mathematical operations on those objects. The latter offers data structures and operations for data manipulation and analysis.

In Flatiron’s Data Science Bootcamp, among the first tools that one learns when being introduced to Python are NumPy and pandas. While there are a number of other widely used tools in Python for data science, I’ll mention the following, which are also taught in the bootcamp:

From here on out, we’ll focus on scikit-learn since it is the primary library for machine learning in Python.


Two Types of Machine Learning

Machine learning is a branch of artificial intelligence and computer science that uses data and algorithms to imitate the way humans learn. Machine learning is often distinguished between supervised and unsupervised learning.

In supervised learning, the algorithm learns from labeled data. Here, each example in the data set is associated with a corresponding label or output. An example of supervised learning would be an algorithm that is learning to correctly identify spam or not-spam email. 

In unsupervised learning, the algorithm learns patterns and structures from unlabeled data without any guidance in the form of labeled outputs. Market clustering, where an algorithm creates clusters of individuals based on demographic data is an example of unsupervised learning.

Using Scikit-Learn: Key Features

Scikit-learn is a popular open-source machine learning library for the Python programming language. It provides simple and efficient tools for data mining and data analysis and is built on top of other scientific computing packages such as NumPy, SciPy, and matplotlib. The following are all key features that help data scientists using scikit-learn work smoothly and efficiently. 

Consistent API

Scikit-learn provides a uniform and consistent API for various machine learning algorithms, making it easy to use and switch between different algorithms. Other libraries such as the aforementioned Keras mimic the scikit-learn syntax, which makes learning how to use other libraries easier.

Wide Range of Algorithms

It offers a comprehensive suite of machine learning algorithms for various tasks, including:

  • Classification (identifying which category an object belongs to)
  • Regression (predicting a continuous-valued attribute associated with an object)
  • Clustering (automatic grouping of similar objects into sets)
  • Dimensionality reduction (reducing the number of random variables to consider)
  • Model selection (comparing, validating, and choosing parameters and models)
  • Preprocessing (feature extraction and normalization)

Ease of Use

Scikit-learn is designed with simplicity and ease of use in mind, making it accessible to both beginners and experts in machine learning. This is not only an artifact of scikit-learn being designed well, but it being a library in Python.

Integration with Other Libraries

It integrates seamlessly with other Python libraries such as NumPy, SciPy, and matplotlib (all of which it is built on). This allows for efficient data manipulation, computation, and visualization.

Community and Documentation

Scikit-learn has a large and active community of users and developers, providing extensive documentation, tutorials, and examples to help users start solving real-world problems. In our experience, we have not used better documentation for any programming language or library than what there is for scikit-learn.

Performance and Scalability

While scikit-learn may not be optimized for very large datasets or high-performance computing, it offers good performance and scalability for most typical machine learning tasks. For very large data sets and interacting with the cloud, there are similar libraries available, such PySpark’s machine learning library MLlib.

Using Scikit-Learn: Conclusion

Overall, scikit-learn is a powerful and versatile library. It’s a standard tool for machine learning practitioners and researchers due to its simplicity, flexibility, and wide range of capabilities. Currently, it is not possible to be a data scientist using Python and not be comfortable and proficient in scikit-learn. That’s why our Flatiron School emphasizes it throughout its curriculum. 

Want to Learn About Careers in Data Science?

Learn about data science career paths through our Data Science Bootcamp page. Curious to know what students learn during their time at Flatiron? Join us for our Final Project Showcase and see work from recent grads.

Rejecting the Null Hypothesis Using Confidence Intervals

In an introductory statistics class, there are three main topics that are taught: descriptive statistics and data visualizations, probability and sampling distributions, and statistical inference. Within statistical inference, there are two key methods of statistical inference that are taught, viz. confidence intervals and hypothesis testing. While these two methods are always taught when learning data science and related fields, it is rare that the relationship between these two methods is properly elucidated.

In this article, we’ll begin by defining and describing each method of statistical inference in turn and along the way, state what statistical inference is, and perhaps more importantly, what it isn’t. Then we’ll describe the relationship between the two. While it is typically the case that confidence intervals are taught before hypothesis testing when learning statistics, we’ll begin with the latter since it will allow us to define statistical significance.

Hypothesis Tests

The purpose of a hypothesis test is to answer whether random chance might be responsible for an observed effect. Hypothesis tests use sample statistics to test a hypothesis about population parameters. The null hypothesis, H0, is a statement that represents the assumed status quo regarding a variable or variables and it is always about a population characteristic. Some of the ways the null hypothesis is typically glossed are: the population variable is equal to a particular value or there is no difference between the population variables. For example:

  • H0: μ = 61 in (The mean height of the population of American men is 69 inches)
  • H0: p1-p2 = 0 (The difference in the population proportions of women who prefer football over baseball and the population proportion of men who prefer football over baseball is 0.)

Note that the null hypothesis always has the equal sign.

The alternative hypothesis, denoted either H1 or Ha, is the statement that is opposed to the null hypothesis (e.g., the population variable is not equal to a particular value  or there is a difference between the population variables):

  • H1: μ > 61 im (The mean height of the population of American men is greater than 69 inches.)
  • H1: p1-p2 ≠ 0 (The difference in the population proportions of women who prefer football over baseball and the population proportion of men who prefer football over baseball is not 0.)

The alternative hypothesis is typically the claim that the researcher hopes to show and it always contains the strict inequality symbols (‘<’ left-sided or left-tailed, ‘≠’ two-sided or two-tailed, and ‘>’ right-sided or right-tailed).

When carrying out a test of H0 vs. H1, the null hypothesis H0 will be rejected in favor of the alternative hypothesis only if the sample provides convincing evidence that H0 is false. As such, a statistical hypothesis test is only capable of demonstrating strong support for the alternative hypothesis by rejecting the null hypothesis.

When the null hypothesis is not rejected, it does not mean that there is strong support for the null hypothesis (since it was assumed to be true); rather, only that there is not convincing evidence against the null hypothesis. As such, we never use the phrase “accept the null hypothesis.”

In the classical method of performing hypothesis testing, one would have to find what is called the test statistic and use a table to find the corresponding probability. Happily, due to the advancement of technology, one can use Python (as is done in the Flatiron’s Data Science Bootcamp) and get the required value directly using a Python library like stats models. This is the p-value, which is short for the probability value.

The p-value is a measure of inconsistency between the hypothesized value for a population characteristic and the observed sample. The p-value is the probability, under the assumption the null hypothesis is true, of obtaining a test statistic value that is a measure of inconsistency between the null hypothesis and the data. If the p-value is less than or equal to the probability of the Type I error, then we can reject the null hypothesis and we have sufficient evidence to support the alternative hypothesis.

Typically the probability of a Type I error ɑ, more commonly known as the level of significance, is set to be 0.05, but it is often prudent to have it set to values less than that such as 0.01 or 0.001. Thus, if p-value ≤ ɑ, then we reject the null hypothesis and we interpret this as saying there is a statistically significant difference between the sample and the population. So if the p-value=0.03 ≤ 0.05 = ɑ, then we would reject the null hypothesis and so have statistical significance, whereas if p-value=0.08 ≥ 0.05 = ɑ, then we would fail to reject the null hypothesis and there would not be statistical significance.

Confidence Intervals

The other primary form of statistical inference are confidence intervals. While hypothesis tests are concerned with testing a claim, the purpose of a confidence interval is to estimate an unknown population characteristic. A confidence interval is an interval of plausible values for a population characteristic. They are constructed so that we have a chosen level of confidence that the actual value of the population characteristic will be between the upper and lower endpoints of the open interval.

The structure of an individual confidence interval is the sample estimate of the variable of interest margin of error. The margin of error is the product of a multiplier value and the standard error, s.e., which is based on the standard deviation and the sample size. The multiplier is where the probability, of level of confidence, is introduced into the formula.

The confidence level is the success rate of the method used to construct a confidence interval. A confidence interval estimating the proportion of American men who state they are an avid fan of the NFL could be (0.40, 0.60) with a 95% level of confidence. The level of confidence is not the probability that that population characteristic is in the confidence interval, but rather refers to the method that is used to construct the confidence interval.

For example, a 95% confidence interval would be interpreted as if one constructed 100 confidence intervals, then 95 of them would contain the true population characteristic. 

Errors and Power

A Type I error, or a false positive, is the error of finding a difference that is not there, so it is the probability of incorrectly rejecting a true null hypothesis is ɑ, where ɑ is the level of significance. It follows that the probability of correctly failing to reject a true null hypothesis is the complement of it, viz. 1 – ɑ. For a particular hypothesis test, if ɑ = 0.05, then its complement would be 0.95 or 95%.

While we are not going to expand on these ideas, we note the following two related probabilities. A Type II error, or false negative, is the probability of failing to reject a false null hypothesis where the probability of a type II error is β and the power is the probability of correctly rejecting a false null hypothesis where power = 1 – β. In common statistical practice, one typically only speaks of the level of significance and the power.

The following table summarizes these ideas, where the column headers refer to what is actually the case, but is unknown. (If the truth or falsity of the null value was truly known, we wouldn’t have to do statistics.)

Demonstrating the four possibilities actual outcomes of a hypothesis test with their probabilities.

Hypothesis Tests and Confidence Intervals

Since hypothesis tests and confidence intervals are both methods of statistical inference, then it is reasonable to wonder if they are equivalent in some way. The answer is yes, which means that we can perform hypothesis testing using confidence intervals.

Returning to the example where we have an estimate of the proportion of American men that are avid fans of the NFL, we had (0.40, 0.60) at a 95% confidence level. As a hypothesis test, we could have the alternative hypothesis as H1 ≠ 0.51. Since the null value of 0.51 lies within the confidence interval, then we would fail to reject the null hypothesis at ɑ = 0.05.

On the other hand, if H1 ≠ 0.61, then since 0.61 is not in the confidence interval we can reject the null hypothesis at ɑ = 0.05. Note that the confidence level of 95% and the level of significance at ɑ = 0.05 = 5%  are complements, which is the “Ho is True” column in the above table.

In general, one can reject the null hypothesis given a null value and a confidence interval for a two-sided test if the null value is not in the confidence interval where the confidence level and level of significance are complements. For one-sided tests, one can still perform a hypothesis test with the confidence level and null value. Not only is there an added layer of complexity for this equivalence, it is the best practice to perform two-sided hypothesis tests since one is not prejudicing the direction of the alternative.

In this discussion of hypothesis testing and confidence intervals, we not only understand when these two methods of statistical inference can be equivalent, but now have a deeper understanding of statistical significance itself and therefore, statistical inference.

Learn More About Data Science at Flatiron

The curriculum in our Data Science Bootcamp incorporates the latest technologies, including artificial intelligence (AI) tools. Download the syllabus to see what you can learn, or book a 10-minute call with Admissions to learn about full-time and part-time attendance opportunities.

Kicking It Up a Notch: Exploring Data Analytics in Soccer

Soccer isn’t just a sport, it’s a global phenomenon. From the electrifying energy of packed stadiums to the shared passion of fans across continents, soccer unites the world like no other. Record-breaking viewership for the 2022 World Cup stands as a testament to this undeniable truth. Soccer’s global reach has fueled a data revolution within the sport. Data analytics is rapidly transforming soccer, impacting teams, players, and organizations through increasingly data-driven decisions. With the 2024 Major League Soccer (MLS) season kicking off, let’s look at how data analytics in soccer is changing the game, one insightful analysis at a time.

The use of data analytics in soccer can be loosely broken down into several key areas of focus.

  • Game planning: The meta analysis of games and matches to determine best play strategies
  • Performance: The hard stats, from individual players to teams to full leagues
  • Recruitment: Finding potential player and coaching talent 

Game Planning

Game planning itself can cover a wide range of topics, including player match-ups and game strategy. Analysis in this field involves utilizing previous games and matches to determine strategies against a given team.

Data analytics in soccer

Image source: Soccer Coach Weekly 

Opponent analysis is a tried and true method across all sports. This form of analysis can be very granular, looking at individual players in specific situations, such as penalty kick line-ups. It can also be high level, identifying overall trends and patterns in opponents’ play that can be exploited. The results of such analysis can drastically change how a team or player approaches a game. 

With the advent of new technology being utilized for data analytics in soccer, opponent analysis and game planning are being brought to new levels of complexity. The combination of wearable tracking devices and modern camera technology has opened the floodgates of data collection, resulting in a smorgasbord of play-by-play positional data that analysts can use to inform game planning decisions. The real-time capabilities of image detection and computer vision also allows for coaches and staff to make in-the-moment decisions.

As technology continues advancing, avenues for game planning analysis include the utilization of virtual and augmented reality to coordinate, plan, and practice set-pieces and specific plays.


A major focus of data analytics in soccer is the collection and use of performance data to inform decisions. Front and center is the analysis of player performance to help develop and improve the individual player. Team performance is often analyzed as well, in conjunction with game planning and in the context of specific match-ups. Developing appropriate metrics to quantify performance is a vital part of this equation and is an ever-evolving field. 

a soccer player kicking a ball down a pitch

Image source: Science for Sport

Individual player performance—and the way it is measured—has drastically changed across the lifetime of soccer analytics. Metrics like expected goals now utilize predictive analytics to quantify a player’s near-future performance (in conjunction with secondary statistics that gather a holistic view of a player’s team contributions). Monitoring and evaluating player performance is essential for trainers and coaches in helping improve player strengths and weaknesses.

A major advancement has been the introduction of non-intrusive wearable devices that can monitor and collect a player’s vital responses. While there are privacy, consent, and data security concerns when it comes to wearables, they have amazing potential to not only help improve player performance on the field but more importantly, help players prevent (and recover) from serious injuries. 

Predicting player and team performance utilizing machine learning algorithms continues to become more important, and has opened up a whole new avenue in data analytics in soccer when it comes to finding talent.


A vitally important part of a winning strategy is bringing together the right mix of talent across players and staff. Soccer organizations are putting a huge emphasis on academic-driven analytics, especially in regards to talent scouting. Currently, there are an extraordinary number of performance analysts working to aid recruitment efforts in the U.S. MLS. Sports journalist Ben Lyttleton sums it up nicely with the below quote, taken from his article on data and decisions in soccer

”Today, the most important hire is no longer the 30-goal-a-season striker or an imposing brick wall of a defender. Instead, there’s an arms race for the person who identifies that talent.”

Performance data for professional soccer player Daniel Pereira
Performance data for professional soccer player Daniel Pereira

Image source: TFA

The Power of Moneyball and Soccernomics

Major shifts in sports analytics occurred following the publication of the books Moneyball in 2003 and Soccernomics in 2009.

Both books expose the power of data in finding undervalued players by highlighting undervalued metrics like niche playing tactics in soccer and on-base percentage in baseball. The end result? Smaller sports organizations can avoid overspending on flashy but less-impactful players and instead focus on acquiring hidden gems with specific skills at lower costs. By embracing data-driven strategies, smaller organizations gain a competitive edge against bigger spenders, proving that efficiency and smart player selection can trump financial muscle.

Predictive analytics has started to play a huge role in this process with organizations attempting to predict the performance of players, team compositions, and even coaches. Feature engineering is playing a pivotal role in advancing this field. How can we quantify the unquantifiable? For example, how can you measure a player’s relationship and attitude with his teammates and coaches? Something so subtle and nuanced has a huge effect on individual and team performance. 


Soccer organizations are placing greater emphasis on utilizing data analytics in their management and recruitment decision-making processes. The inclusion of new technologies to advance the science of data collection allows analysts to capture the minutiae of player performance across many aspects of the game. There is a huge need for intuitive, creative-thinking data analysts within the realm of soccer (and sports as a whole), and their analyses will play a pivotal role in how the game of soccer continues to evolve and thrive. 

Learn Data Science at Flatiron School

Flatiron’s Data Science Bootcamp can put you on the path to a career in the field in as little as 15 weeks. Download our syllabus to see what you can learn, or take a prep course for free. You can also schedule a 10-minute call with our admissions office to learn more about the school and its program.

Additional Reading