Relational Databases vs Excel
One of the largest differences between Excel and a RDBMS is the ability to set up relationships between tables (SAIT, 2017). Using a Primary Key and Foreign key relationship tables are able to reference each other when querying data (SAIT, 2017) .In Excel there can have multiple worksheets within a workbook and some basic referencing can occur, but this system allows reference to a value rather than a record. Excel can use the value in cell A1 in worksheet 1 to use the value in cell C5 from worksheet 2, but Excel is unable to set up a relationship like we see in a RDBMS. With a table in a RDMS there is a primary key that uniquely identifies the record (Mark Shellman & Sasha Vodnik, 2019). Each entry will represent a unique piece of data and this reduces data redundancy. In Excel this is not the case and duplicate rows can be entered easily.Another fundamental difference between a RDBMS and Excel is
how each views rows/records and columns/fields.
In a RDMS a record is seen as a single entity, each field in the record
describes a characteristic of that record.
As an example, we can say an apple is red, red is a characteristic of
the apple and should always be associated with it (SAIT, 2017). In Excel this is not the case, although it is
still organized into rows and columns, and generally a row is going to
represent characteristics of an entity occurrence, such as our apple being red.
Excel itself doesn't see this relationship and it is relatively easy to move
the red value to another record.
This difference in how a RDMBs and Excel handle the data structure
can be most easily seen when sorting data.
In Excel it is possible to sort a worksheet on a single field while not
sorting the other fields in the worksheet (Microsoft, n.d.). For
example, we can sort Column A but not Column B.
While there are times this is useful, there are times it can be
disastrous to your data if not noticed, perhaps the apple in the previous
example is now orange. In a RDMS we can
sort records by a field, but it sorts the entire record, an apple is always
red.
Some other differences with Excel and a RDBMS, while perhaps
not fundamental but still important, exist around how data is treated within
the software package as it is entered.
In Access as data is entered, the system writes the new entries into the
database (Mark Shellman & Sasha Vodnik, 2019). In Excel the document must be saved before
data is updated to the physical drive.
Another difference between the two is in a RDBMS new records are added
to the end of the database, while in Excel rows can be inserted where we want (Mark Shellman & Sasha Vodnik, 2019).
These aspects of a relational database, unique rows and
related tables, work together to allow each piece of data to be stored once,
lowering or removing redundant data from our system (SAIT, 2017). As an example, if we had an oil well that we
were storing information about, such as the well’s depth and when it was drilled,
and we also wanted to record information about events that happened when we
were drilling. In Access we could set up two tables and use the well's unique identifier
(UWI) to store the wells depth (TD) and drill date in one table and the well's
UWI and drilling notes in another. With
this system we only need to store a Well's TD and drill date once and we can
create a relationship to our drill notes table.
If we did this in Excel, we would need to store the well's TD and drill
date for each record of the drilling notes table. This may not seem like a big issue on a small
scale, but even here a well has 100 drilling notes in them we are storing the
TD and drill dates 99 more times than we need to. You can imagine how quickly the data storage requirements
will balloon as we want to store more information about a given well such as operator,
current status, if it is horizontal etc.
In our modern world physical storage space isn't expensive,
so this may not seem like it is a large issue.
There are other aspects of storing large amounts of data that should be
considered beyond cost.
One important aspect to consider when thinking about databases
like Excel vs RDBMS is simply how much data each system can store and use effectively,
the RDBMS can utilize much more data than Excel can. Using related tables allows RDBMS to reduce
stored data in the first place and then use and process it better, we can store
data across multiple worksheets in Excel but doing so isn’t practical and
causes performance issues. (Peter Kane, 2020). With a RDBMS we can use the relationships and
SQL to query the database to return only the data we need to work on (Peter Kane, 2020).
One of the primary things to consider is trust in your
data. With the amount of data, users
deal with in the modern world they can’t simply visually check that the data is
accurate. Often dealing with thousands
of rows we cannot simply check the data visually. Using tables and the relationships between
them we can be assured that the data will be consistent and accurate (Mark Shellman & Sasha Vodnik, 2019). As an example, if we are entering the wells
from our previous example and we need to manually type in the UWI each time we
enter a record as we would in Excel a 0 might get switched for a 1 or some
other error that goes unnoticed but could affect our results. In Access this UWI is entered once in the primary
table and linked to the related table.
Despite all of the benefits of using a RDBMS over Excel to
store data, why might a company still turn to Excel to store and process data?
Works Cited.
Mark Shellman & Sasha Vodnik. (2019). New Pespectives:
Microsoft Office 365 & Access 2019. Cenage Learning Inc.
Microsoft. (n.d.). Sort data in a
range or table. Microsoft Support.
https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-62d0b95d-2a90-4610-a6ae-2e545c4a4654
Peter Kane. (2020, July 14). We Have
Excel, Who Needs SQL. Medium.Com.
https://medium.com/swlh/we-have-excel-who-needs-sql-10035a7d7d44
SAIT. (2017, January 1). Database
Concepts: Tables and Relationships. Winter 2022 - GIS Data Modelling
(GEOS-418-O3A).
https://learn.sait.ca/content/enforced/491285-202130GEOS-418-O3A/course_files/lectures/W03_DatabaseTables.pdf?_&d2lSessionVal=rWFKxOu8deTMkRmfMZOVIjHtk&ou=491285
Comments
Post a Comment