We all are surrounded by data. Data is a magical thing that is growing just the same as Carbon-dioxide has grown in our environment, with the difference that it is good for us, not as bad as growing global warming. I say this because if we have a huge amount of data, then we will have the power to make better decisions in the future.
We all generate new data every day, either by liking a post or commenting on the other’s post, or when we upload a new post on any social site.
Nowadays, companies are very sensitive about data, as collecting, storing, processing, and analyzing the data is vital for making better decisions. Every company have to take web development seriously as everything is online in the present and will be in the future world.
There are so many tools and programming languages that help us to do the above mentioned tasks. Excel is a powerful spreadsheet tool for doing data analyses.
But it has so many limitations when it comes to tackling a huge amount of data. Most companies use EXCEL + VBA Scripting to do some complex calculations, but it also has various limitations.
So data analysts always try to adopt new ways that help them to speed up their work and generate quality analysis. To do that, data analysts use Programming languages that are far more powerful than any other spreadsheet tool. Python and R are the most preferred programming languages for doing data analysis.
In this blog, I will not talk about the R programming language, but we will explore the power of Python. You will learn how to handle large amounts of data using a real-life example through this blog.
Requirements to start Programming
What you will require before starting the actual programming:
- Python should be installed in your system
- You should have an editor where you write the python code. I suggest you install Jupyter Notebook.
- Install Numpy and Pandas Library before starting the coding.
- Last but the most important point is that you should have the curiosity to go beyond the limits of using data. Curiosity is key!
Now that you have all the requirements aligned, let’s start the journey of data analysis.
Setting Workspace
Importing Data
View the file format of your data. And add code accordingly:
If you have a CSV file, then write the following code:
df = pd.read_csv(r“Actual_path_of_your_csv_file”)
If you have an Excel file, then write following code:
df = pd.read_excel(open(r“Actual_path_of_your_excel_file”, “rb”), sheet_name=“Name_of_sheet_which_you_want_to_import”)
I have an excel sheet, so I used the second option in the following example.
Basic Functions about the Data
Now you have imported the data into Python. Next step is that you need to apply so that you have a Bird Eye View of your data.
Shape Function
The shape function shows you the total number of rows and columns in your imported file. Write df.shape in your Jupyter notebook cell and execute the cell by pressing Shift+Enter.
If you are only interested in Rows, then write df.shape[0]
If you are only interested in Columns, then write df.shape[1]
Head Function
If you want to see the top few records, then you can use head(). Write df.head() in your Jupyter notebook cell and execute the cell by pressing Shift+Enter. It will return a data frame with the top five records.
If you want to see more than 5 records, then you can mention the number in round brackets df.head(10) now it returns the top 10 records.
Tail Function
If you want to see the few records from the bottom, then you can use tail(). Write df.tail() in your Jupyter notebook cell and execute the cell by pressing Shift+Enter. It will return a data frame with the bottom five records.
If you want to see more than five records, then you can mention the number in round brackets df.tail(10); now, it returns the top 10 records.
Getting all Column Names
If you want to get the names of all columns, then you just simply write df.columns, and it will return the all column name.
Getting the Specific Column
You can extract any column by using its name. After applying the code below which will return you a list of values that are stored in the column.
Syntax:
Dataframe[“Column_name”]
Example:
df[“Candidate Name”]
Check the Data Type of Column
Now, as we know that we store data in columns, and we will be curious to know about the data type of column before applying any operations on it. So for that, write the following code in your Jupyter Notebook cell:
Syntax:
Dataframe[“Column_name”].dtype()
Example:
df[“Candidate Age”].dtype()
Sum Function
If you have some numeric columns in your data and you just want to know the Sum by adding each value of that particular column, then you can use the sum() function.
Before applying this formula make sure that column type is not String
Syntax:
Dataframe[“Column_name”].sum()
Example:
df[” Total Valid Votes”].sum()
In the following example, I sum up all the valid votes which are polled in 117 constituencies of Punjab.
Finding the Average of a Particular Column
If you want to find the average of a column then you can use mean() function
Syntax:
Dataframe[“Column_name”].mean()
Example:
df[” Total Valid Votes”].mean()
In the following example, I got the average votes which are polled for each candidate.
Finding the Maximum Value of a Particular Column
If you want to find the maximum value of a column then you can use max() function
Syntax:
Dataframe[“Column_name”].max()
Example:
df[” Total Valid Votes”].max()
In the following example, I got the maximum votes poll for the candidate.
Finding the Minimum Value of a Particular Column
If you want to find the minimum value of a column then you can use min() function
Syntax:
Dataframe[“Column_name”].min()
Example:
df[” Total Valid Votes”].min()
In the following example, I got the minimum votes polled for the candidate.
Finding the Standard Deviation of a Particular Column
If you want to find the minimum value of a column then you can use std() function
Syntax:
Dataframe[“Column_name”].std()
Example:
df[” Total Valid Votes”].std()
Basic String Functions
Now let us discuss some very useful string functions which are helpful in your day-to-day job. But before applying these string functions, make sure the column type is String.
Finding the Length of String
If you want to find the minimum value of a column then you can use std() function
Syntax:
Dataframe[“Column_name”].str.len()
Example:
df[“Constituency Name”].str.len()
It will return the list which has numeric values, and these numeric values represent the length of the corresponding String. You can add this list as a New Column if you want to show the length of the String in your data.
Capitalizing the First Character of each Word
As you know that we can not have a Title Case(Capitalize first character of each word) function in excel but python have. So for that use title() function
Syntax:
Dataframe[“Column_name”].str.title()
Example:
df[“Candidate Name”].str.title()
Upper Case
You can use upper() function to make a string characters uppercase
Syntax:
Dataframe[“Column_name”].str.upper()
Example:
df[“Candidate Name”].str.upper()
Lower Case
You can use lower() function to make a string characters lowercase
Syntax:
Dataframe[“Column_name”].str.lower()
Example:
df[“Candidate Name”].str.lower()
Getting Specific Record
To get the specific record from your data, you may confirm that your data has at least one column which has a unique value. The concept is similar to a Primary key in SQL. You can also mix up multiple columns to get a specific record.
Like in my example I extract the records by using Constituency Name and Candidate Name following code:
df[(df[“Constituency Name”] == “Sultanpur Lodhi “) & (df[“Candidate Name”] == “SAJJAN SINGH CHEEMA”)]
Getting a Group of Records
Sometimes you might want to extract the data which belongs to the same category. Like in the following example, I want to extract the data for Sultanpur Lodhi Constituency, and I want Candidate names in the title case, and then I will export this data as sultapur-lodhi-2017.csv
Now sultapur-lodhi-2017.csv file contains data only from Sultanpur Lodhi Constituency.
Wrapping up
So, in this blog, you have learned some basic functions to analyze a huge amount of data. I have just given you a small tour of data analysis in Python. There are tons of things that are uncovered and are there to explore.
To read more blogs, visit www.webdew.com. If you are looking for web design and web development services, our web team will be thrilled to get you what you want! Contact us to know more.