Understanding Data: A Beginner’s Guide to Data Types and Structures

Coding a function, an app, or a website is an act of creation no different from writing a short story or painting a picture. From very simple tools we create something where there never was something. Painters have pigments. Writers have words. Coders have data types.

Data types govern just about every aspect of coding. Each type represents a specific kind of thing stored in a computer’s memory, and has different ways of being used in writing code. They also range in complexity from the humble integer to the sophisticated dictionary. This article will lay out the basic data types, using Python as the base language, and will also discuss some more advanced data types that are relevant to data analysts and data scientists.

Computers, even with the wondrous innovations of generative AI in the last few years, are still— just as their name suggests—calculators. The earliest computers were people who performed simple and complex calculations faster than the average person. (All the women who aided Alan Turing in cracking codes at Bletchley Park officially held the title of computers.)

A hierarchical map of all the data types in Python
A hierarchical map of all the data types in Python. This article only discusses the most common ones: integers, floats, strings, lists, and dictionaries.
Source: Wikipedia

Simple Numbers

It’s appropriate, then, that the first data type we discuss is the humble integer. These are the whole numbers 0 to 9, in their millions of combinations from 0 to 999,999,999,999 and beyond, including negative numbers. Different programming languages handle integers differently.

Python supports integers, usually denoted as int, as “arbitrary precision.” This means it can hold as many places as the computer has memory for. Java, on the other hand, recognizes int as the set of 32-bit integers ranging from -2,147,483,648 to 2,147,483,647. (A bit is the smallest unit of computer information, representing the logical state of on or off, 1 or 0. A 32-bit system can store 2^32 different values.)

The next step up the complexity ladder brings us to floating point numbers, or more commonly, floats. Floating point numbers approximate real numbers that include decimalized fractions from 0.0 to 99.999 and so on, again including negative numbers and repeating to the limits of a computer’s memory. The level of precision (number of decimal places) is constrained by a computer’s memory, but Python implements floats as 64-bit numbers.

With these two numeric data types, we can perform calculations. All the arithmetic operations are available with Python, along with exponentiation, rounding, and modulo operation. (Denoted %, modulo returns the remainder of division. For example, 3 % 2 returns 1.) It is also possible to convert a float to an int, and vice versa.

From Numbers to Letters and Words

Although “data” and “numbers” are practically synonymous in popular understanding, data types also exist for letters and words. These are called strings or string literals

“Abcdefg” is a string.

So is:

“When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another.”

For that matter so is:

“Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur vitae lobortis enim.”

Computer languages don’t care whether the string in question is a letter, a word, a sentence, or complete gobbledygook. They only care, really, whether the thing being stored is a number or not a number, and specifically whether it is a number the computer can perform arithmetic on. 

An aside: to humans, “1234” and 1234 are roughly the same. We humans can infer from context whether something is a calculable number, like on a receipt, or a non-calculable number, like a ZIP code. “1234” is a string. 1234 is an integer. We call those quotation marks “delimiters” and they serve the same role for the computer that context serves for us humans.  

Strings come with their own methods that allow us to do any number of manipulations. We can capitalize words, reverse them, and slice them up in a multitude of ways. Just as with floats and ints, it’s also possible to tell the computer to treat a number as if it were a string and do those same manipulations on it. This is handy when, for example, you encounter a street address in data analysis, or if you need to clean up financial numbers that someone left a currency mark on.

Encoding numbers and letters is a valuable tool, but we also need ways to check that things are what they claim to be. This is the role of the boolean data type, usually shortened to bool. This data type takes on one of two values: True or False, although it is also often represented as 1 or 0, respectively. The bool also gives rise to the boolean operators (<,>, !=, ==) which evaluate the truth value of some expression, like 2 < 3 (True) or “Thomas” == “Tom” (False).

Ints, floats, and strings are the most basic data types that you will find across computing languages. And they are very useful for storing singular things: a number, a word. (Strictly speaking, an entire library could be encoded as a single string.)

These individual data types are great if we only want to encode simple, individual things, or make one set of calculations. But the real value of (digital) computers is their ability to do lots of calculations very very quickly. So we need some additional data types to collect the simpler ones into groups.

From Individuals to Collections

The first of these collection data types is the list. This is simply an unordered collection of objects of any data type. In Python, lists are set off by brackets ([]). (“Ordered” means that the elements are arranged any old way, and not sorted high-to-low or low-to-high.)

For example, the below is a list:

[1, 3.7, 2, 3.4, 4, 6.74, 5.0]

 This is also a list:

[“John”, “Mary”, “Sue”, “Alphonse”]

Even this a list:

[1, “John”, 2.2, “Mary”, 3] 

It’s important to note that within a list, each element (or item) of the list still operates according to the rules of its data type. But know that the list as a whole also has its own methods. So it’s possible to remove elements from a list, add things to a list, sort the list, as well as do any of the manipulations the individual data types support on the appropriate elements (e.g., calculations on a float or an integer, capitalize a string).

The last and probably most complicated of the basic data types is the dictionary, usually abbreviated dict. Some languages refer to this as a map. Dictionaries are unordered collections of key-value pairs, set off by braces ({}). Individual key-value pairs inside a dictionary are set off with a comma. They enable a program to use one value to get another value. So, for example, a dictionary for a financial application might contain a stock ticker and that ticker’s last closing price,like so:

{“AAPL”: 167.83,

“GOOG”: 152.62,

“META”: 485.58}

In this example, “AAPL” is the key, 167.83 is the value. A program that needed the price of Apple’s stock could then get that value by calling the dictionary key “AAPL.” And as with lists, the individual items of the dictionary, both keys and values, retain the attributes of their own data types.

These pieces form the basics of data types in just about every major scripting language in use today. With these (relatively) simple tools you can code up anything from the simplest function to a full spreadsheet program or a Generative Pre-trained Transformer (GPT). 

Extending Data Types into Data Analysis

If we want to extend the data types that we have into even more complicated forms, we can get out of basic Python and into libraries like Numpy and Pandas. These bring additional data types that expand Python’s basic capabilities into more robust data analysis and linear algebra, two essential functions for machine learning and AI.

First we can look at the Numpy array. This handy data type allows us to set up matrices, though they really just look like lists, or lists of lists. However, they are less memory intensive than lists, and allow more advanced calculation. They are therefore much better when working with large datasets.

If we combine a bunch of arrays, we wind up with a Pandas DataFrame. For a data analyst or machine learning engineer working in Python this is probably your most common tool. It can hold and handle all the other data types we have discussed. The Pandas DataFrame is, in effect, a more powerful and efficient version of the Excel spreadsheet. It can handle all the calculations that you need for exploratory data analysis and data visualization.

Data types in Python, or any programming language, form the basic manipulable unit. Everything a language uses to store information for future use is a data type of some kind, and each type has specific things it can store, and rules governing what it can do.

Learn Data Types and Structures in Flatiron’s Data Science Bootcamp

Ready to gain a deeper understanding of data types and structures to develop real-world data science skills? Our Data Science Bootcamp can help take you from basic concepts to in-demand applications. Learn how to transform data into actionable insights—all in a focused and immersive program. Apply today and launch your data science career!

Kendall McNeil: From Project Management to Data Science

Inspired by the power of data and a yearning to explore a field that aligned perfectly with her strengths, Kendall McNeil, a Memphis, TN resident, embarked on a strenuous career journey with Flatiron School. In this blog, we’ll delve into Kendall’s story – from her pre-Flatiron experience to the challenges and triumphs she encountered during the program, and ultimately, her success in landing a coveted Data Scientist role.

Before Flatiron: What were you doing and why did you decide to switch gears?

For eight years, Kendall thrived in the world of project management and research within the fields of under-resourced education and pediatric healthcare. Data played a crucial role in her work, informing her decisions and sparking a curiosity for Python’s potential to streamline processes. However, a passion for coding piqued her curiosity outside of work, compelling her to explore this field further.

“When I found Flatiron School, I was excited about the opportunity to level up my coding skills and gain a deeper understanding of machine learning and AI,” shared Kendall.

The scholarship opportunity she received proved to be a pivotal moment, encouraging her to strategically pause her career and fully immerse herself in Flatiron School’s Data Science program for four intensive months. This decision reflected not just a career shift, but a commitment to aligning her work with her true calling.

During Flatiron: What surprised you most about yourself and the learning process during your time at Flatiron School? 

Flatiron School’s rigorous curriculum challenged Kendall in ways she didn’t anticipate. Yet, the supportive environment and exceptional instructors like David Elliott made a significant difference.

“Big shout out to my instructor, David Elliott,” expressed Kendall in appreciation. “Throughout my time in his course, he skillfully balanced having incredibly high standards for us, while remaining approachable and accessible.”

Beyond the initial surprise of just how much she loved learning about data science, Kendall was particularly impressed by the program’s structure. The curriculum’s fast pace, coupled with the ability to apply complex concepts to hands-on projects, allowed her to build a strong portfolio that would become instrumental in her job search. The downloadable course materials also proved to be a valuable resource, something she continues to reference in her current role.

After Flatiron: What are you most proud of in your new tech career? 

Looking back at her Flatiron experience, Kendall highlights her capstone project as a source of immense pride. The project involved creating an AI model designed to detect up to 14 lung abnormalities in chest X-rays. This innovation has the potential to address a critical challenge in healthcare – the high rate (20-30%) of false negatives in chest X-ray diagnoses.

“The model, still a work in progress, boasts an 85% accuracy rate and aims to become a crucial ally for healthcare providers, offering a second opinion on these intricate images by identifying subtle patterns that may be harder to detect with the human eye,” explained Kendall.

However, her pride extends beyond the technical aspects of the project. By leveraging Streamlit, Kendall successfully deployed the model onto a user-friendly website, making it accessible to the everyday user. This focus on accessibility aligns with her core belief in the importance of making complex data and research readily available.

Within just six weeks of completing the program, she received multiple job offers – a testament to the skills and foundation she acquired at Flatiron School. With support from her Career Coach, Sandra Manley, Kendall navigated the interview process with ease. Currently, Kendall thrives in her new role as a Data Scientist at City Leadership. She’s recently embarked on a “data listening tour” to understand the organization’s data needs and explore possibilities for future innovation.

“It has been a joy and, again, I really feel that I have discovered the work I was made for!” concluded Kendall.

Kendall invites you to follow her journey on social media: GitHub Portfolio | Blog | LinkedIn

Summary: Unleashing Your Potential at Flatiron School

Kendall’s story is a shining example of how Flatiron School empowers individuals to pursue their passions and embark on fulfilling tech careers. The program’s immersive curriculum, coupled with exceptional instructors and a focus on practical application, equips students with the skills and knowledge they need to thrive in the data science field.

Inspired by Kendall’s story? Ready to take charge of your future and embark on your own transformative journey?

Apply Now to join Flatiron School’s Data Science program and connect with a community of like-minded individuals. You could be the next success story we celebrate! And for even more inspiring stories about career changers like Kendall, visit the Flatiron School blog.

Intro to Predictive Modeling: A Guide to Building Your First Machine Learning Model

Predictive modeling is a process in data science that forecasts future outcomes based on historical data and statistical algorithms. It involves building mathematical models that learn patterns from past data to make predictions about unknown or future events. These models analyze various variables or features to identify relationships and correlations, which are then used to generate predictions. Well over half of the Flatiron School’s Data Science Bootcamp program involves learning about various predictive models.

Applications of Predictive Modeling

One common application of predictive modeling is in finance, where it helps forecast stock prices, predict market trends, and assess credit risk. In marketing, predictive modeling helps companies target potential customers more effectively by predicting consumer behavior and preferences. For example, companies can use customer data to predict which products a customer is likely to purchase next or which marketing campaigns will yield the highest return on investment.
Healthcare is another field that uses predictive modeling. Predictive modeling plays a vital role in identifying patients at risk of developing certain diseases. It also helps improve treatment outcomes and optimize resource allocation. By analyzing patient data, such as demographics, medical history, and lifestyle factors, healthcare providers can predict potential health issues and intervene early to prevent or manage them effectively. 

Manufacturing and logistics widely use predictive modeling to optimize production processes, predict equipment failures, and minimize downtime. By analyzing data from sensors and machinery, manufacturers can anticipate maintenance needs and schedule repairs before breakdowns occur, reducing costs and improving efficiency.

Overall, predictive modeling has diverse applications across various industries, helping businesses and organizations make more informed decisions, anticipate future events, and gain a competitive advantage in the marketplace. Its ability to harness the power of data to forecast outcomes makes it a valuable tool for driving innovation and driving success in today’s data-driven world.

The Steps for Building a Predictive Model

Below is a step-by-step guide to building a simple predictive machine learning model using Python pseudocode. Python is a versatile, high-level programming language known for its simplicity and readability, making it an excellent choice for beginners and experts alike. Its extensive range of libraries and frameworks, particularly in fields such as data science, machine learning, artificial intelligence, and scientific computing, has solidified its place as a cornerstone in the data science community. While Flatiron School Data Science students learn other technical skills and languages, such as dashboards and SQL, the primary language that students learn and use is Python.

Step 1

In Step 1 below (in the gray box), Python libraries are imported. A Python library is a collection of functions and methods that allows you to perform many actions without writing your code. It is a reusable chunk of code that you can use by importing it into your program, saving time and effort in coding from scratch. Libraries in Python cover a vast range of programming needs, including data manipulation, visualization, machine learning, network automation, web development, and much more. 

The two most widely used Python libraries are NumPy and pandas. The former adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level functions to operate on these arrays. The latter is a high-level data manipulation tool built on top of the Python programming language. It is most well-suited for structured data operations and manipulations, akin to SQL but in Python. 

The third imported Python library is scikit-learn, which is an open-source machine learning library that proves a wide range of supervised and unsupervised learning algorithms. It is built on NumPy, SciPy, and Matplotlib, offering tools for statistical modeling, including classification, regression, clustering, and dimensionality reduction. In data science, scikit-learn is extensively used for developing predictive models and conducting data analysis.  Its simplicity, efficiency, and easy for integration with other Python libraries makes it an essential tool for machine learning practitioners and researchers.

Code chunk for importing libraries

Step 2

Now that the libraries have been imported in Step 1, the data needs to be brought in—as can be seen in Step 2. Since we’re considering predictive modeling, we’ll use the feature variables to predict the target variable. 

In a dataset for a predictive model, feature variables (also known as predictors or independent variables) are the input variables that are used to predict the outcome. They represent the attributes or characteristics that help the model learn patterns to make predictions. For example: In a dataset for predicting house prices, feature variables might include:

  • Square_Feet: The size of the house in square feet
  • Number_of_Bedrooms: The number of bedrooms in the house
  • Age_of_House: The age of the house in years
  • Location_Rating: A rating representing the desirability of the house’s location

The target variable (also known as the dependent variable) is the output variable that the model is trying to predict. Continuing with our housing example, the target variable would be:

  • House_Price: The price of the house

Thus, In this scenario, the model learns from the feature variables (Square_Feet, Number_of_Bedrooms, Age_of_House, Location_Rating) to accurately predict the target variable (House_Price).

Code chunk for loading and preprocessing data

Step 3

Note, We split the dataset into training and test sets in Step 2. We did this to evaluate the predictive model’s performance on unseen data, ensuring it can generalize well beyond the data it was trained on. This split helps identify and mitigate overfitting, where a model performs well on its training data but poorly on new, unseen data, by providing a realistic assessment of how the model is likely to perform in real-world scenarios.

Now comes the key moment in Step 3, where we use our statistical learning model. In this case, we’re using multiple linear regression, which is an extension of simple linear regression. It is designed to predict an outcome based on multiple independent variables, and fits a linear equation to the observed data where the target variable is modeled as a linear combination of two or more feature variables, incorporating a separate coefficient (slope) for each independent variable plus an intercept. This approach allows for the examination of how various feature variables simultaneously affect the outcome. It provides a more comprehensive analysis of the factors influencing the dependent variable.

Code chunk for choosing and training the model

Step 4

In Step 4, we evaluate the model to find out how well it fits the data. There are a myriad of metrics that one can use to evaluate predictive learning models. In the pseudocode below, we use the MSE, or the mean squared error.

Code chunk for evaluating the model

The MSE is a commonly used metric to evaluate the performance of a regression model. It measures the average squared difference between the actual values (observed values) and the predicted values generated by the model. Mathematically, it is calculated by taking the average of the squared differences between each predicted value and its corresponding actual value. The formula for MSE is:

Formula for MSE

In this formula, 

  • n is the number of observations
  • yi represents the actual value of the dependent variable for the ith observation
  • ŷi represents the predicted value of the dependent variable for the ith observation

A lower MSE value indicates that the model’s predictions are closer to the actual values, suggesting a better fit of the model to the data. Conversely, a higher MSE value indicates that the model’s predictions are further away from the actual values, indicating poorer performance.

Step 5

At this point, one usually would want to tune (i.e., improve on the model). But for this introductory explanation, Step 5 will be to use our model to make predictions.

Code chunk for predictions

Summary of Predictive Modeling

The pseudocode in Steps 1 through 5 shows the basic steps involved in building a simple predictive machine learning model using Python. You can replace placeholders like `’your_dataset.csv’`, `’feature1’`, `’feature2’`, etc., with actual data and feature names in your dataset. Similarly, you can replace `’target_variable’` with the name of the target variable you are trying to predict. Additionally, you can experiment with different models, preprocessing techniques, and evaluation metrics to improve the model’s performance.

 Predictive modeling in data science involves using statistical algorithms and machine learning techniques to build models that predict future outcomes or behaviors based on historical data. It encompasses various steps, including data preprocessing, feature selection, model training, evaluation, and deployment. Predictive modeling is widely applied across industries for tasks such as forecasting, classification, regression, anomaly detection, and recommendation systems. Its goal is to extract valuable insights from data to make informed decisions, optimize processes, and drive business outcomes. 

Effective predictive modeling requires a combination of domain knowledge, data understanding, feature engineering, model selection, and continuous iteration to refine and improve model performance over time that lead to actionable insights.

Learn About Predictive Modeling (and More) in Flatiron’s Data Science Bootcamp

Forge a career path in data science in as little as 15 weeks by attending Flatiron’s Data Science Bootcamp. Full-time and part-time opportunities await, and potential career paths the field holds include ones in data analysis, AI engineering, and business intelligence analysis. Apply today or schedule a call with our Admissions team to learn more!

Mapping Camping Locations for the 2024 Total Solar Eclipse

The data visualizations in this blog post—which are tied to best camping locales for viewing the 2024 total solar eclipse—are not optimized for a mobile screen. For the best viewing experience, please read on a desktop or tablet.

I once read about a married couple who annually plan vacations to travel the world in pursuit of solar eclipses. They spoke about how, regardless of their location, food preferences, or language abilities, they always managed to share a moment of awe with whoever stood near them as they gazed up at the hidden sun.

While I can’t speak to the experience of viewing an eclipse abroad, I did travel to the path of totality for a solar eclipse in 2017, and I can confirm the feeling of awe and the sense of shared experience with strangers. Chasing eclipses around the world isn’t something I can easily squeeze into my life, but when an eclipse is nearby, I make an effort to go see it. 

On April 8, 2024, a solar eclipse will pass over the United States. It will cast the moon’s shadow over the states of Texas, Oklahoma, Arkansas, a sliver of Tennessee, Missouri, Kentucky, Illinois, Indiana, Ohio, Michigan, Pennsylvania, New York, Vermont, New Hampshire, and Maine.

As I’ve been making plans for this eclipse, I’ve been using NASA’s data tool for exploring where in the path of totality I might view the eclipse from. This tool is exceptional and includes the time the eclipse will begin by location, a simulation of the eclipse’s journey, and a weather forecast. I have several friends who have traveled to see an eclipse only to be met with a gray cloudy sky, so keeping an eye on the weather is very important

But, as I was using this tool, I found myself wanting to know what camping options fall within the path of totality for the 2024 total solar eclipse. I, like many, prefer to make a small camping vacation out of the experience, and the location of parks is information not provided by NASA’s data tool. So I found a dataset detailing the location of 57,000 parks in the United States, isolated the parks that fall within the path of totality, and plotted them.

Below is the final visualization, with an added tooltip for viewing the details of the park.
For those interested in the code for this project, check out the links to the data and my data manipulation and visualization code.

Happy eclipsing, everyone!

Gain Data Science Skills at Flatiron School

Learn the data science skills that employers are after in as little as 15 weeks at Flatiron School. Our Data Science Bootcamp offers part-time and full-time enrollment opportunities for both onsite and online learning. Apply today or download the syllabus for more information on what you can learn. Interested in seeing the types of projects our students complete upon graduation? Attend our Final Project Showcase.

Introduction to Natural Language Processing (NLP) in Data Science

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and linguistics that focuses on the interaction between computers and human languages. It encompasses a range of techniques aimed at enabling computers to understand, interpret, and generate human language in a manner that is both meaningful and contextually relevant. 

In data science, NLP plays a pivotal role in extracting insights from vast amounts of textual data. Through techniques such as text classification, sentiment analysis, named entity recognition, and language translation, NLP empowers data scientists to analyze and derive actionable insights from unstructured text data sources such as social media, customer reviews, emails, and news articles. By harnessing the power of NLP, data scientists can uncover patterns, trends, and sentiments within textual data. This enables organizations to make data-driven decisions and enhance various aspects of their operations, from customer service to product development and market analysis.

NLP is fundamental to generative AI models like ChatGPT. Natural language processing techniques enable these models to understand and generate human-like text, making them capable of engaging in meaningful conversations with users. NLP provides the framework for tasks such as language understanding, sentiment analysis, summarization, and language generation. All are essential components of generative AI systems.

Applications of NLP

NLP techniques are extensively utilized in text classification and sentiment analysis, offering a wide array of applications across various industries.

Text Classification

NLP enables automatic categorization of textual data into predefined classes or categories. Applications include:

  • Spam detection: NLP algorithms can classify emails or messages as spam or non-spam, helping users manage their inbox efficiently.
  • Topic classification: NLP models categorize news articles, research papers, or social media posts into relevant topics, aiding in content organization and information retrieval.
  • Language identification: NLP models can identify the language of a given text, which is useful for multilingual platforms and content analysis.

Sentiment Analysis

NLP techniques are employed to analyze the sentiment or emotion expressed in textual data, providing valuable insights for decision-making. Applications include:

  • Brand monitoring: Sentiment analysis helps businesses monitor online conversations about their brand, products, or services. This enable them to gauge public perception and address potential issues promptly.
  • Customer feedback analysis: NLP algorithms analyze customer reviews, surveys, and social media comments. Then, they work to understand customer sentiment towards specific products or services, facilitating product improvement and customer satisfaction.
  • Market research: Sentiment analysis aids in analyzing public opinion and sentiment towards specific topics or events, providing valuable insights for market research, trend analysis, and forecasting.
  • Social media analysis: NLP techniques are utilized to analyze sentiment in social media posts, tweets, and comments, enabling businesses to track customer sentiment, identify influencers, and engage with their audience effectively.

NLP Techniques

NLP encompasses a variety of techniques designed to enable computers to understand and process human languages. Two fundamental techniques in NLP are tokenization and stemming, which play crucial roles in text preprocessing and analysis.

Tokenization

Tokenization is the process of breaking down a piece of text into smaller units (called tokens). These tokens can be words, phrases, or other meaningful elements. The primary goal of tokenization is to divide the text into individual units for further analysis. There are different tokenization strategies, including:

  • Word tokenization divides the text into words or word-like units. For example, the sentence “The quick brown fox jumps over the lazy dog” would be tokenized into [“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”].
  • Sentence tokenization splits the text into sentences. For instance, the paragraph “Natural Language Processing (NLP) is a fascinating field. It involves analyzing and understanding human language” would be tokenized into [“Natural Language Processing (NLP) is a fascinating field.”, “It involves analyzing and understanding human language.”].

Stemming 

Stemming is the process of reducing words to their root or base form, known as the stem. The goal of stemming is to normalize words so that different forms of the same word are treated as identical. Stemming algorithms apply heuristic rules to remove suffixes and prefixes from words. For example:

  • Original Word: “Running”
  • Stemmed Word: “Run”
  • Original Word: “Jumped”
  • Stemmed Word: “Jump”

Stemming is particularly useful in tasks such as text mining, information retrieval, and search engines. Why? Because stemming reduces words to their base forms can improve indexing and retrieval accuracy. Both tokenization and stemming are essential preprocessing steps in many NLP applications, including text classification, sentiment analysis, machine translation, and information retrieval. These techniques help transform raw textual data into a format suitable for further analysis and modeling, facilitating the extraction of meaningful insights from large volumes of text data.

Natural Language Processing (NLP) Resources

Given the comprehensive overview of NLP’s applications and techniques, several resources can significantly aid in deepening your understanding and skills in this field. Books such as Natural Language Processing in Action by Lane, Howard, and Hapke, and Speech and Language Processing by Jurafsky and Martin provide foundational knowledge and practical examples. These texts are excellent for understanding the underlying principles of NLP. They’re also great for reference on specific topics like tokenization, stemming, and machine learning models used in NLP. Regardless of which NLP resource is used, the key is to practice coding the models.

Learn More about NLP in Flatiron’s Data Science Bootcamp

Flatiron School’s Data Science Bootcamp teaches natural language processing, data analysis and engineering, machine learning fundamentals and much more. Full-time and part-time enrollment opportunities await! Apply today or schedule a call with Admissions to learn more about what Flatiron can do for you and your career. 

Enhancing Your Tech Career with Remote Collaboration Skills

Landing a career in the tech industry requires more than just technical/hard skills; it requires soft skills like effective communication, adaptability, time management, problem-solving abilities, and remote collaboration skills. Remote collaboration is especially key for those who work in tech; according to U.S. News & World Report, the tech industry leads all other industries with the highest percentage of remote workers.

At Flatiron School, we understand the importance of these skills in shaping successful tech professionals. Hackonomics, our AI-focused hackathon event happening between March 8 and March 25, will see participants sharpen remote collaboration skills (and many others) through the remote team-based building of an AI-driven personal finance platform. We’ll reveal more about Hackonomics later in the article; right now, let’s dive deeper into why remote collaboration skills are so important in today’s work world.

Mastering Remote Collaboration Skills

Remote collaboration skills are invaluable in today’s digital workplace, where teams are often distributed across different locations and time zones. Whether you’re working on a project with colleagues halfway across the globe or collaborating with clients remotely, the ability to effectively communicate, problem-solve, and coordinate tasks in a remote work setting is essential for success. Here are some other key reasons why this skill is becoming so important. 

Enhanced Productivity and Efficiency

Remote collaboration tools and technologies empower teams to communicate, coordinate, and collaborate in real-time, leading to increased productivity and efficiency. With the right skills and tools in place, tasks can be completed more quickly, projects can progress smoothly, and goals can be achieved with greater ease.

Flexibility and Work-life Balance

Remote work offers unparalleled flexibility, allowing individuals to balance their professional and personal lives more effectively. However, this flexibility comes with the responsibility of being able to collaborate effectively from anywhere, ensuring that work gets done regardless of physical location.

Professional Development and Learning Opportunities

Embracing remote collaboration opens doors to a wealth of professional development and learning opportunities. From mastering new collaboration tools to honing communication and teamwork skills in virtual settings, individuals can continually grow and adapt to the evolving demands of the digital workplace.

Resilience in the Face of Challenges

Events such as the COVID-19 pandemic—and the massive shift to at-home work it caused—has highlighted the importance of remote collaboration skills. When faced with unforeseen challenges or disruptions, the ability to collaborate remotely ensures business continuity and resilience, enabling teams to adapt and thrive in any environment.

Join Us for the Hackonomics Project Showcase and Awards Ceremony

Come see the final projects born out of our Hackonomics teams’ remote collaboration experiences when our Hackonomics 2024 Showcase and Awards Ceremony happens online on March 28. The event is free to the public and offers those interested in attending a Flatiron School bootcamp a great opportunity to see the types of projects they could work on should they enroll.

The 8 Things People Want Most from an AI Personal Finance Platform

Great product design is one of those things you just know when you see it, and more importantly—use it. It’s not just about being eye-catching; it’s about serving a real purpose and solving a real problem—bonus points if you can solve that problem in a clever way. If there ever was a time to build a fintech app, that time is now. The market is ripe, the problems to solve are plenty, and the tools and resources are readily available. Flatiron School Alumni from our Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps have been tasked to help me craft Money Magnet, an AI personal finance platform that solves common budget-making challenges. They’ll tackle this work during Hackonomics, our two-week-long hackathon that runs from March 8 to March 25.

There is one goal in mind: to help individuals and families improve their financial well-being through an AI financial tool.

A loading screen mockup for AI personal finance platform Money Magnet
A loading screen mockup for AI personal finance platform Money Magnet

My Personal Spreadsheet Struggle

The concept for Money Magnet sprang from personal frustration and mock research around user preferences in AI finance. As a designer, I often joke, “I went to design school to avoid math.” Yet, ironically, I’m actually quite adept with numbers. Give me a spreadsheet and 30 minutes, and I’ll show you some of the coolest formulas, conditional formats, and data visualization charts you’ve ever seen.

Despite this, in my household, the responsibility of budget management falls squarely to my partner. I prefer to stay blissfully unaware of our financial details—knowing too much about our funds admittedly tends to lead to impulsive spending on my part. However, occasionally I need to access the budget, whether it’s to update it for an unexpected expense or to analyze historical data for better spending decisions.

We’re big on goal-setting in our family—once we set a goal, we stick to it. We have several future purchases we’re planning for, like a house down payment, a new car, a vacation, and maybe even planning for children. 

But here’s the catch: None of the top AI financial tools on the market incorporate the personal finance AI features that Money Magnet proposes bringing to the market. Families need an AI personal finance platform that looks into our spending patterns from the past and projects into the future to tell users when the budget gets tighter. This product should be easy to use with access to all family members to make changes without fear of wrecking the budget.

For more context, each year, my partner forecasts a detailed budget for us. We know some expenses fluctuate—a grocery trip might cost $100 one time and $150 the next. We use averages from the past year to estimate and project those variable expenses. This way, we manage to live comfortably without having to scale back in tighter months, fitting in bigger purchases when possible, and working towards an annual savings goal.

Top financial apps chart from Sensor Tower
Top financial apps chart from Sensor Tower

But here’s where the challenge lies: My partner, as incredible as he is, is not a visualist. He can navigate a sea of spreadsheet cells effortlessly, which is something I struggle with (especially when it comes to budgeting). I need a big picture, ideally represented in a neat, visual chart or graph that clearly illustrates our financial forecast.

Then there’s the issue of access and updates. Trying to maneuver a spreadsheet on your phone in the middle of a grocery store is far from convenient. And if you make an unplanned purchase, updating the sheet without disrupting the formulas can be a real hassle, especially on a phone. This frustration made me think, “There has to be a better solution!”

Imagining the Ultimate AI Personal Finance Platform

Imagine an AI personal finance platform that “automagically” forecasts the future, securely connects to your bank and credit cards to pull transaction histories, and creates a budget considering dynamic and bucketed savings goals. This dream app would translate data into a clear dashboard, visually reporting on aspects like spending categories, monthly trends in macro and micro levels, amounts paid to interest, debt consolidation plans, and more.

It’s taken eight years of experiencing my partner’s budget management to truly understand a common struggle that many other families in the U.S. face: Advanced spreadsheet functions, essential in accounting and budgeting, are alien to roughly 73% of U.S. workers.

The extent of digital skills in the U.S. workforce according to OECD PIAAC survey data. Image Source: Information Technology and Innovation Foundation
The extent of digital skills in the U.S. workforce according to OECD PIAAC survey data. Image Source: Information Technology and Innovation Foundation

Money Magnet aims to automate 90% of the budgeting process by leveraging AI recommendations about users’ personal finances to solve eight of the key findings outlined in a mock research study based on some of the challenges I had faced when developing a budget of my own.

Features to Simplify Your Finances

This dream budgeting tool is inspired by my own financial journey and the collective wish list of what an ideal personal finance assistant should be. Here’s a snapshot of the personal finance AI features that aims to position Money Magnet as one of the top AI financial tools on the market:

  1. Effortless Onboarding: Starting a financial journey shouldn’t be daunting. Money Magnet envisions a platform where setting up accounts and syncing banking information is as quick and effortless as logging into the app, connecting your bank accounts, and establishing some savings goals (if applicable).
  2. Unified Account Dashboard: Juggling multiple banking apps and credit card sites can be a circus act, trying to merge those separate ecosystems as a consumer is nearly impossible. Money Magnet proposes a unified dashboard, a one-stop financial overview that could declutter your digital financial life.
  3. Personalized AI Insights: Imagine a platform that knows your spending habits better than you do, offering bespoke guidance to fine-tune your budget. Money Magnet aims to be that savvy financial companion, using AI to tailor its advice just for you.
  4. Vivid Data Visualization: For those of us who see a blur of numbers on statements and spreadsheets, Money Magnet could paint a clearer picture with vibrant graphs and charts—turning the abstract into an understandable, perceivable, engaging, and dynamic visual that encourages you to monitor the trends.
  5. Impenetrable Security: When dealing with informational and financial details, security is non-negotiable. Money Magnet will prioritize protecting your financial data with robust encryption and authentication protocols, so your finances are as secure as Fort Knox.
  6. Intelligent Budget Optimization and Forecasting: No more cookie-cutter budget plans that force your spending to fit conventional categorization molds! Money Magnet will learn user preferences in AI finance and forecast from your historic spending, suggesting ways to cut back on lattes or add to your savings—all personalized to improve your financial well-being based on your real-world spending and forecast into the future to avoid pinch-points.
  7. Smooth Bank Integrations: Another goal of Money Magnet is to eliminate the all-too-common bank connection hiccups where smaller banks and credit unions don’t get as much connectivity as the larger banks, ensuring a seamless link between your financial institutions and the app.
  8. Family Financial Management: Lastly, Money Magnet should be a tool where managing family finances is a breeze. Money Magnet could allow for individual family profiles, making it easier to teach kids about money and collaborate on budgeting without stepping on each other’s digital toes or overwriting a budget. It’s important for those using Money Magnet to know it can’t be messed up, and that any action can always be reverted.

See the Money Magnet Final Projects During Our Closing Ceremony on March 28

Attend the Hackonomics 2024 Showcase and Awards Ceremony on March 28 and see how our participating hackathon teams turned these eight pillars of financial management into a reality through their Money Magnet projects. The event is online, free of charge, and open to the public. Hope to see you there!

Decoding Data Jargon: Your Key to Understanding Data Science Terms

There are a myriad of data science terms used in the data science field, where statistics and artificial intelligence are employed to discover actionable insights from data. For example, data science can be used by banks for fraud detection, or for content recommendations from streaming services. This post focuses on some of the key terms from statistics that are commonly used within data science and then concludes with a few remarks on using data science terminology correctly.

Definitions of Key Data Science Terms

Let’s look at some of the key terms in data science that you need to have a grasp on.

Numeric and Categorical Data

Data can be either numeric (or quantitative) or categorical (or qualitative). Numeric data represents quantities or amounts. Categorical data represents attributes that can be used to group or label individual items. If a student is a first-generation college student who is taking 17 semester units, then the student’s educational generation is categorical and the number of units is numeric.

Types of Statistics

When one is introduced to the use of statistics in data science, terms generally fall within one of the two main branches of statistics that serve different purposes in the analysis of data: descriptive statistics and inferential statistics

Descriptive statistics summarize and organize characteristics of a data set. They give a snapshot of the data through numbers, charts, and graphs without making conclusions beyond the data analyzed or making predictions.

Descriptive Statistics: Mean, Median, Mode, Standard Deviation, and Correlation

Measures of central tendency provide a central point around which the data is distributed and measures of variability describe the spread of the data. The two most common measures of central tendency for numeric data are the mean and the median. The most common measure of central tendency for categorical data is the mode. The mean in data science is the average value (sum all of the values and divide by the number of observations). The median in data science is the middle value, and the mode is the most common value. 

Note that while the mode is generally used for categorical data, numeric data can also have modes. Consider the following made-up data set that is listed in order for simplicity: 2, 3, 7, 9, 9. The mode is 9 since it is the only value that shows up more than once. The median is 7 since it is precisely the middle value, and the mean is 30/5 = 6. The most used measure of variability is the standard deviation, which can be thought of as the average distance that each observation is from the mean. In the toy example noted above, the standard deviation is 3.31. So on average, each number of the data set is 3.31 away from the mean.

All of the aforementioned descriptive statistics are for univariate data (i.e., data with only one variable). More often in data science, we look at data that is multivariate. For instance, one could have two variables—the height and weight of NBA players. A descriptive statistic that describes the relationship between these variables is called the correlation. The correlation is a value between -1 and 1 and represents the strength and direction of the relationship.

Inferential Statistics: Confidence Intervals and Hypothesis Tests

Now let’s turn to some key terms from inferential statistics that are used in data science. There are two main types of inferential statistics: confidence intervals and hypothesis tests. Confidence intervals give an estimate of an unknown population value. Hypothesis tests determine if a data set is significantly different from an assumed value regarding the population at a certain level of confidence. 

For example, a confidence interval that is estimating the average (mean) height of NBA players in inches could be (75 inches, 81 inches). Whereas for a hypothesis test we can claim that the average height of NBA players is 78 inches and then test to see if our data differs substantially from that value. If our data set has a sample mean of 74 inches, then it is likely that this shows statistical significance because our mean is so different from the assumed population mean of 78 inches. While if our data set has a sample mean of 77 inches, then it is unlikely that this will show statistical significance since our sample mean and the assumed population mean are close.

For a much more technical overview of statistical significance, confidence intervals, and hypothesis testing, please see our post “Rejecting the Null Hypothesis Using Confidence Intervals.”

How to Use Data Science Terms Wisely

Time now for an anecdote. A friend of mine—let’s call him Yinzer—was giving a presentation to his boss. He was tasked with presenting descriptive statistics on the company’s data. He included in his presentation a descriptive statistic called the kurtosis since that value was produced by the software. Yinzer’s boss asked him, “What is kurtosis?” Yinzer didn’t know and was unable to answer the question. 

The moral of the story is: only use data science terms such as those that we have discussed like mean, median, standard deviation, correlation, and hypothesis testing if you are confident in being able to explain them.

Some Additional Tips for Using Data Science Terminology

Here are some additional tips for using data science terminology if you are a beginner in the field:

Focus on understanding, not memorizing: Don’t try to memorize every term you encounter. Instead, focus on grasping the underlying concepts and how they relate to each other. This will allow you to learn new terms organically as you progress.

Practice with real data: The best way to solidify your understanding is to apply it. Find beginner-friendly datasets online and use them to practice basic data cleaning, analysis, and visualization. This will expose you to terminology in a practical setting.

Engage with the data science community: Join online forums, attend meetups, or connect with other data science beginners. Discussing concepts and terminology with others can solidify your understanding and expose you to new terms in a collaborative environment.

Learn Data Science at Flatiron in 15 Weeks

Full-time students in Flatiron’s Data Science Bootcamp can graduate in under four months with the skills needed to land data analyst, AI engineer, and data scientist jobs. Book a 10-minute call with our Admissions team to learn more.

How to Achieve Portfolio Optimization With AI

Here’s a fact: Employers are seeking candidates with hands-on experience and expertise in emerging technologies. Portfolio optimization using Artificial Intelligence (AI) has become a key strategy for people looking to break into the tech industry. Let’s look at some of the advantages of having an AI project in a portfolio, and how portfolio optimization with AI can be a possible game changer in regards to getting your foot in the door at a company.

The Pros of Having AI Projects in a Portfolio

For people seeking to transition into the tech industry, having AI projects in their portfolios can be a game-changer when it comes to landing coveted roles and advancing their careers. By showcasing hands-on experience with AI technologies and their applications in real-world projects, candidates can demonstrate their readiness to tackle complex challenges and drive innovation in any industry. Employers value candidates who can leverage AI to solve problems, optimize processes, and deliver tangible results, making AI projects a valuable asset for aspiring tech professionals.

Achieving portfolio optimization with AI by integrating AI into portfolios is quickly becoming a cornerstone of success for tech job seekers. However, portfolio optimization with AI involves more than just adopting the latest technology. It requires a strategic business approach and a deep understanding of Artificial Intelligence. Below are details about Hackonomics, Flatiron School’s AI-powered budgeting hackathon

The Components of Flatiron’s AI Financial Platform Hackathon

Identifying the Right Business Problem

The Hackonomics project revolves around cross-functional teams of recent Flatiron graduates building an AI-driven financial platform to increase financial literacy and provide individualized financial budgeting recommendations for customers. Identifying the right business problem entails understanding the unique needs and challenges of a target audience, ensuring that a solution addresses critical pain points and that the utilization of AI delivers tangible value to users.      

AI Models

At the core of Hackonomics are machine learning models meticulously designed to analyze vast amounts of financial data. These models will enable the uncovering of valuable insights into user spending patterns, income sources, and financial goals, laying the foundation for personalized recommendations and budgeting strategies.

Software and Product Development

As students develop their Hackonomics projects, continuous product development and fine-tuning are essential for optimizing performance and usability. This involves iterating on platform features (including UI design and SE functionality) and refining AI algorithms to ensure that the platform meets the evolving needs of users and delivers a seamless and intuitive experience.

Security and Encryption

Ensuring the security and privacy of users’ financial data is paramount. The Hackonomics project incorporates robust security measures, including encryption techniques, to safeguard sensitive information from outside banking accounts that need to be fed into the platform. Additionally, multi-factor authentication (MFA) adds an extra layer of protection, mitigating the risk of unauthorized access and enhancing the overall security posture of our platform.

Join Us at the Hackonomics Project Showcase on March 28

From March 8 to March 25, graduates of Flatiron School’s Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps will collaborate to develop fully functioning AI financial platforms that analyze user data, provide personalized recommendations, and empower individuals to take control of their financial futures.

The Hackonomics outcomes are bound to be remarkable. Participants will create a valuable addition to their AI-optimized project portfolios and gain invaluable experience and skills that they can showcase in job interviews and beyond.

The judging of the projects will take place from March 26 to 27, followed by the showcase and awards ceremony on March 28. This event is free of charge and open to prospective Flatiron School students, employers, and the general public. Reserve your spot today at the Hackonomics 2024 Showcase and Awards Ceremony and don’t miss this opportunity to witness firsthand the innovative solutions that emerge from the intersection of AI and finance. 

Unveiling Hackonomics, Flatiron’s AI-Powered Budgeting Hackathon

Are you interested in learning about how software engineering, data science, product design, and cybersecurity can be combined to solve personal finance problems? Look no further, because Flatiron’s AI-powered budgeting hackathon—Hackonomics—is here to ignite your curiosity.

This post will guide you through our Hackonomics event and the problems its final projects aim to solve. Buckle up and get ready to learn how we’ll revolutionize personal finance with the power of AI.

Source: Generated by Canva and Angelica Spratley
Source: Generated by Canva and Angelica Spratley

Unveiling the Challenge

Picture this: a diverse cohort of recent Flatiron bootcamp graduates coming together on teams to tackle an issue that perplexes and frustrates a huge swath of the population—personal budgeting.

Hackonomics participants will be tasked with building a financial planning application named Money Magnet. What must Money Magnet do? Utilize AI to analyze spending patterns, income sources, and financial goals across family or individual bank accounts.

The goal? To provide personalized recommendations for optimizing budgets, identifying potential savings, and achieving financial goals through a dynamic platform that contains a user-friendly design with interactive dashboards, a personalized recommendation system to achieve budget goals, API integration of all financial accounts, data encryption to protect financial data, and more.

The Impact of AI in Personal Finance

Let’s dive a little deeper into what this entails. Integrating AI into personal finance isn’t just about creating fancy algorithms; it’s about transforming lives through the improvement of financial management. Imagine a single parent struggling to make ends meet, unsure of where their hard-earned money is going each month. With AI-powered budgeting, they can gain insights into their spending habits, receive tailored recommendations on how to save more effectively, and ultimately, regain control of their financial future. It’s about democratizing financial literacy and empowering individuals from all walks of life to make informed decisions about their money.

Crafting an Intuitive Technical Solution Through Collaboration

As the teams embark on this journey, the significance of Hackonomics becomes abundantly clear. It’s not just about building an advanced budgeting product. It’s about building a solution that has the power to vastly improve the financial health and wealth of many. By harnessing the collective talents of graduates from Flatiron School’s Cybersecurity, Data Science, Product Design, and Software Engineering bootcamps, Hackonomics has the opportunity to make a tangible impact on people’s lives.

Let’s now discuss the technical aspects of this endeavor. The platforms must be intuitive, user-friendly, and accessible to individuals with varying levels of financial literacy. They also need to be up and running with personalized suggestions in minutes, not hours, ensuring that anyone can easily navigate and understand their financial situation. 

Source: Generated by Canva and Angelica Spratley
Source: Generated by Canva and Angelica Spratley

Embracing the Challenge of Hackonomics

Let’s not lose sight of the bigger picture. Yes, the teams are participating to build a groundbreaking platform, but they’re also participating to inspire change. Change in the way we think about personal finance, change in the way we leverage technology for social good, and change in the way we empower individuals to take control of their financial destinies.

For those participating in Hackonomics, it’s not just about building a cool project. It’s about honing skills, showcasing talents, and positioning themselves for future opportunities. As participants develop their AI-powered budgeting platforms, they’ll demonstrate technical prowess, creativity, collaborative skills, and problem-solving abilities. In the end, they’ll enhance their portfolios with AI projects, bettering their chances of standing out to potential employers. By seizing this opportunity, they’ll not only revolutionize personal finance but also propel their careers forward.

Attend the Hackonomics Project Showcase and Awards Ceremony Online

Participation in Hackonomics is exclusively for Flatiron graduates. Participants will build their projects from March 8 through March 25. Winners will be announced during our project showcase and awards ceremony closing event on March 28.

If you’re interested in attending the showcase and ceremony on March 28, RSVP for free through our Eventbrite page Hackonomics 2024 Showcase and Awards Ceremony. This is a great opportunity for prospective students to see the types of projects they can work on should they decide to apply to one of Flatiron’s bootcamp programs.