Relational Databases vs Excel

One aspect of online courses at SAIT in this program is the use of discussion boards to get students to think and hopefully generate some discussion.  The idea is of course to recreate the class discussion and interaction between students.  My contributions seemed like a good inclusion in the blog. 

Relational Databases vs Excel

One of the largest differences between Excel and a RDBMS is the ability to set up relationships between tables (SAIT, 2017). Using a Primary Key and Foreign key relationship tables are able to reference each other when querying data (SAIT, 2017) .In Excel there can have multiple worksheets within a workbook and some basic referencing can occur, but this system allows reference to a value rather than a record.  Excel can use the value in cell A1 in worksheet 1 to use the value in cell C5 from worksheet 2, but Excel is unable to set up a relationship like we see in a RDBMS.  With a table in a RDMS there is a primary key that uniquely identifies the record (Mark Shellman & Sasha Vodnik, 2019). Each entry will represent a unique piece of data and this reduces data redundancy.  In Excel this is not the case and duplicate rows can be entered easily.

Another fundamental difference between a RDBMS and Excel is how each views rows/records and columns/fields.  In a RDMS a record is seen as a single entity, each field in the record describes a characteristic of that record.  As an example, we can say an apple is red, red is a characteristic of the apple and should always be associated with it (SAIT, 2017). In Excel this is not the case, although it is still organized into rows and columns, and generally a row is going to represent characteristics of an entity occurrence, such as our apple being red. Excel itself doesn't see this relationship and it is relatively easy to move the red value to another record.

This difference in how a RDMBs and Excel handle the data structure can be most easily seen when sorting data.  In Excel it is possible to sort a worksheet on a single field while not sorting the other fields in the worksheet (Microsoft, n.d.).  For example, we can sort Column A but not Column B.  While there are times this is useful, there are times it can be disastrous to your data if not noticed, perhaps the apple in the previous example is now orange.  In a RDMS we can sort records by a field, but it sorts the entire record, an apple is always red.

Some other differences with Excel and a RDBMS, while perhaps not fundamental but still important, exist around how data is treated within the software package as it is entered.  In Access as data is entered, the system writes the new entries into the database (Mark Shellman & Sasha Vodnik, 2019).  In Excel the document must be saved before data is updated to the physical drive.  Another difference between the two is in a RDBMS new records are added to the end of the database, while in Excel rows can be inserted where we want (Mark Shellman & Sasha Vodnik, 2019).

These aspects of a relational database, unique rows and related tables, work together to allow each piece of data to be stored once, lowering or removing redundant data from our system (SAIT, 2017). As an example, if we had an oil well that we were storing information about, such as the well’s depth and when it was drilled, and we also wanted to record information about events that happened when we were drilling. In Access we could set up two tables and use the well's unique identifier (UWI) to store the wells depth (TD) and drill date in one table and the well's UWI and drilling notes in another.  With this system we only need to store a Well's TD and drill date once and we can create a relationship to our drill notes table.  If we did this in Excel, we would need to store the well's TD and drill date for each record of the drilling notes table.  This may not seem like a big issue on a small scale, but even here a well has 100 drilling notes in them we are storing the TD and drill dates 99 more times than we need to.  You can imagine how quickly the data storage requirements will balloon as we want to store more information about a given well such as operator, current status, if it is horizontal etc.

In our modern world physical storage space isn't expensive, so this may not seem like it is a large issue.  There are other aspects of storing large amounts of data that should be considered beyond cost. 

One important aspect to consider when thinking about databases like Excel vs RDBMS is simply how much data each system can store and use effectively, the RDBMS can utilize much more data than Excel can.  Using related tables allows RDBMS to reduce stored data in the first place and then use and process it better, we can store data across multiple worksheets in Excel but doing so isn’t practical and causes performance issues. (Peter Kane, 2020).  With a RDBMS we can use the relationships and SQL to query the database to return only the data we need to work on (Peter Kane, 2020).

One of the primary things to consider is trust in your data.  With the amount of data, users deal with in the modern world they can’t simply visually check that the data is accurate.  Often dealing with thousands of rows we cannot simply check the data visually.  Using tables and the relationships between them we can be assured that the data will be consistent and accurate (Mark Shellman & Sasha Vodnik, 2019).  As an example, if we are entering the wells from our previous example and we need to manually type in the UWI each time we enter a record as we would in Excel a 0 might get switched for a 1 or some other error that goes unnoticed but could affect our results.  In Access this UWI is entered once in the primary table and linked to the related table.

Despite all of the benefits of using a RDBMS over Excel to store data, why might a company still turn to Excel to store and process data?

Works Cited.

Mark Shellman & Sasha Vodnik. (2019). New Pespectives: Microsoft Office 365 & Access 2019. Cenage Learning Inc.

Microsoft. (n.d.). Sort data in a range or table. Microsoft Support. https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-62d0b95d-2a90-4610-a6ae-2e545c4a4654

Peter Kane. (2020, July 14). We Have Excel, Who Needs SQL. Medium.Com. https://medium.com/swlh/we-have-excel-who-needs-sql-10035a7d7d44

SAIT. (2017, January 1). Database Concepts: Tables and Relationships. Winter 2022 - GIS Data Modelling (GEOS-418-O3A). https://learn.sait.ca/content/enforced/491285-202130GEOS-418-O3A/course_files/lectures/W03_DatabaseTables.pdf?_&d2lSessionVal=rWFKxOu8deTMkRmfMZOVIjHtk&ou=491285

 

Comments