Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
Enterprise Computing Year 12 Unit 1: Data Science | Christopher Kalodikis | YouTubeToText
YouTube Transcript: Enterprise Computing Year 12 Unit 1: Data Science
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Enterprise Computing HSC course unit one
data science so in this data science
unit we are establishing that data is
the foundation of all systems we need
data in our systems because data is what
supports our decision making processes
as humans and also indirectly using AI
to interpret data and then us still
viewing what the AI interprets to
understand what actions we should take
place within our Enterprise what we
should do moving forward so this whole
unit is targeted at understanding data
so the first subsection is that of
collecting storing and analyzing data
three separate processes that are
aligned with data in how we get it how
we handle it and how we understand it
firstly is's understanding the
difference between quantitative and
qualitative data quantitative in the
amounts of data we are getting and then
qualitative how valuable that data is we
need to understand and that
distinguishment this day and age it's
very easy to collect data and store data
but then that brings us to our next
point of Big Data the fact that we can
get data very easily but this data then
accumulates and takes up file space so
we need specialized systems and make use
of online storage for storing Big Data
because it is hard to store but having
that data is available because we can
make lots of analyst and interpretations
from that data so we need to build our
systems around the notion of Big
Data one fact that you might know about
data is that we need to store data at
specific data types the way we store
data impacts on its function what
software can be used with it compression
that can be applied with it screen tools
we can use to gather it and interpret it
so that is the data types and the basic
data types are text and number which can
be in form of integers and floating
points okay but then we also have
booleans where we can select
specifically if it's an on or off or yes
or no type of response within it and
then obviously the file extensions that
can be applied to data as well all of
those impact on data's data type and
interprets how a system will use that
data from here then we have a
measurements applied to data so a
variety of different ways of scaling
data and grasping how large it could be
and how it could be used there and
there's some interesting terms there
that I won't go into at at the moment
because I'm still learning them myself
from here we've got got data sampling
when we are getting data from the
environment and putting it into the
system and one such area we know with
data sampling is that of gathering audio
data that's actually called sampling but
we're going beyond that here too and
then the notion of active and passive
sampling when we are intentionally
getting specific types of data or
whether the system is doing it itself
automatically and Gathering data for
us we then have aspects related to data
relevance okay how relevant is the data
for the enterpr prises operations
accuracy the uh correctness of data that
we are getting which we're using in our
system the validity that data is valid
and follows the appropriate rules for it
to be correct by the system and then the
reliability of data in satisfying what
we are using it for so they do overlap
they are all features of each other but
all slightly different definitions there
ultimately that we are getting data for
our system that is correct and
meaningful for our operations
we then have informatics supporting our
understanding of data now this can be
done in a variety of ways but the way
data is then starting to get displayed
in the system we can start making those
interpretations which brings us then to
our next point of presenting data and
I've got this one in yellow because this
might be ways you are presenting it
within your assessment task through
graphs and infographics okay which begin
illustrating data and making them better to
to
comprehend those spreadsheet style Dash
ports where we have data and it's
represented visually but then if we
start changing the data within this uh
spreadsheet then the actual graphs that
are on display and the pivot tables are
on display they change live in response
to the values we are changing we can use
data then to generate reports as an
output of data that could be presented
and have our own interpretations written
on it and we can also stablish things
such as Network diagrams and Maps which
can show obviously the makeup up of
different segments of a network or a
geographical location and how data might
different in it has it's dispersed
across a specific Network or landscape
so all those features can be used to present
present
data then we can talk about structured
and unstructured data sets and this can
be affiliated with big data but
essentially as data is accumulated is it
in a structured format or is it just
Gathering numbers and we need to kind of
structure it later and what we do there
okay so we need to differentiate between
those two forms of data sets Okay from
here then we also need to gather sources
of feedback based on our data or based
on our system okay where data is acting
uh and we are getting it but then what
response are we doing in relation to
that data because that's what it's all
about we get the data and we make a
response so it's ensures that we know
what our sources of feedback are we have
criteria to make sure that our feedback
is effective and appropriate for
whatever our system is
doing now we then come to errors in data
and errors can be detrimental to
operations so they must be identified so
that they can be addressed errors can
come at the initial point of collection
from our data sources which is why it's
so important to cross reference our data
sources if we are putting incorrect data
into our system it will ultimately be
incorrect information and once processed
create incorrect values that our system
will process and we need to make sure we
identify that so we then don't use those
incorrect values as a part of our
decision-making processes as said this
stems to Raw verse process data okay
when data goes in raw we haven't checked
it there could be errors there and if
it's then processed okay it will lead to
incorrect operations taking place so we
need things such as validations and
verifications in place to check data
when it is entered into a system that it
does go in correctly and if someone
accidentally does do a typo okay it will
identify this is the wrong format or
doesn't follow the range limits we put
on it okay that there rules in place to
ensure that data when goes into the
system through validation through
verification that ensures that it's
entered as a correct format but as said
if the data source is correct we're
going to get from it's still going to go
incorrectly so we need to do our own
research on our end to cross check data
and make sure it's correct the other
area of error related to data is that of
bias that we are selecting data sources
that skew data to a specific way that we
want it to be this can be intentional or
it could be unintentional that we're
just not doing a wide enough level of
research and Gathering data from a wide
enough array of different sources for
our system so we've got a factor in that
buyas can lead to errors as well and
we've got to try to counteract that by
getting data from a variety of locations
in a bunch of diverse locations that
fully represent the scope of data we're
trying to represent within our enterprise
enterprise
system the next one then is blockchain
blockchain being that we can track the
movement of data and this is obviously
heavily affiliated with cryptocurrency
and that might be the best way to
understand that we can actually see how
a cryptocurrency such as Bitcoin has
moved through different ownerships and
we can actually track it from its
Inception so we can actually track data
that's what blockchain is all about so
areas where blockchaining can be used
such as for online voting and tracking
who's doing specific voting um online
identities and what those identities are
doing the movement of specific items
when we this could be digital items or
physical items but knowing who has
ownership on them and thus support
recordkeeping we can put a name to these
things okay and I should specify with
online voting too they're probably not
tracking the name of the person they're
probably just tracking their voting not
you're not allowed to track who they
actually voted for all that because it
is meant to have a nomin imity to it and
all of
that the next area is then privacy and
security of data and specific tools that
we've got to be aware of where we might
have to put security procedures in place
one such one which is obvious is
AutoFill it's great that we our personal
information and our financial
information can be remembered by our
browser and be integrated and inserted
into text boxes automatically when we do
it do an online purchase but then
there's a security Factor related to
that so convenience can be at the cost
of security we got to weigh that up with
our system we also have that of private
and public connections it's great to we
go somewhere such as a public library
and access a public network connection
but is it secure whereas if I use my own
hotspotting or if I just do my work from
home I have a better private connection
there the use of checkbox too uh can
also be a factor in relation to security
when we're switching things on and off
and how it's being used and then also
terms of agreements for the things that
we sign up for are we actually reading
them and that's a big issue because we
sign up for a lot of things these days
specifically with online platforms but
do we fully understand what we're
signing up for it is Ed in their terms
of agreement they do say such as through
social media platforms how they're going
to use our data but we didn't even read
it CU you know sometimes we sign up when
we're young and we don't even care but
those terms of agreement could say that
they're going to use the pictures we're
uploading to a platform as a part of
their own business okay or it could also
limit what we can do as a part of their
licensing agreement how we use their
specific data and platforms so all this
is important and it's all written within
terms of agreement it's just so long to
read and that's also an issue there in
relation to privacy and security
and then we also have the impact of data
scale the amount of data that is
available we are very data Rich these
days it's very easy to get data as said
with big data so we've got to factor in
the volume of raw data we're putting
into our systems how much we're putting
in where's it going to be stored whether
locally or in online platforms which is
more so the case so that it can be
networked as part of a large enterprise
system how data might not necessarily be
downloaded uh from these online
platforms but it's more likely to be
streamed live to keep the data of the
local storage and keep it on the online
storage for
efficiency the way machine learning
interacts with data so machine learning
is obviously when the AI is learning
itself so based on it accumulating data
it changes its responses and interprets
data in different ways so that
accumulations helps it learn data can
also impact on human behavior us as
humans responding to data what do we see
how do we change our actions in response
to data and then the ethical
implications of data what do we do in
response to data and also where are we
getting data from is it always ethical
how we get the data all right and where
can we read data from and who owns that
data so there's many aspects to data and
specifically the collection and who is
viewing it that relate to the ethic of
it okay not all data can be public
because it's all about P it relates to
private individuals in some cases so
there's many ethical implications in
relation to the impact of data okay the
final two things I'll talk about in this
section is firstly data storage how data
will be stored and I've already said it
a few times that we have data that could
be stored on the local storage of a
system on its hard drives and solid
States we can also have local network
storage where we have our own servers
but also these days as well we have
cloud cloud storage and then a variety
of ways that can be used public clouds P
private clouds hybrid clouds and then
that often is the foundation for the
enterprise system and the sharing of
data across a Global Network for that
system so that is then the data storage
but then we also have this thing called
a data warehouse because we have so much
data okay sometimes we take data from
specific time periods so it might be
last year's data related to last year's
customers okay and then we save that
away to a data warehouse once put in
that data warehouse it might then go
with all our previous years okay worth
of data in that warehouse and we store
it there to analyze that data using
Technologies such as olap okay which are
used for data mining and in that data
warehouse we then can look for Trends
and patterns in historical data that can
support us in planning for future
operations so a very supportive tool to
okay for the storage of data but the
analysis of data okay and hopefully
assist us with predicting successful
plans for the future the next section
then is that of data quality data
quality means that obviously data is
correct and reliable but data is
Meaningful for the operations of an
Enterprise so firstly is the ethical use
of data as we already said with ethical
implications we've got this data now we
need to control who who can view this
data and that might be linked to
permissions and who the data is relevant
to as a part of their operations within
the Enterprise and also the sharing of
data and data transparency and the fact
that we have people's personal data
we've got to keep it secure from cyber
security and things like that as well so
we've got to keep an ethical lens on
when accessing data realizing data is
viable and we've got to keep it
private this links us to our social
legal and ethical issues that a bias
which I spoke about before where we can
skew data in different directions and we
should try to get data from a variety of
sources the accuracy of data and how
correct it is the use of metadata the
data behind data okay which is the
fundamentals of databases and websites
and the fact that that also needs to be
kept private because that has uh links
to private
information copyright of specific data
and systems and the acknowledgement of
sources of data that are used within our
systems that we are referencing systems
companies people who produce data when
being used with our systems and then
stemming from that IP intellectual
property and then ICI IP indigenous
cultures intellectual property okay that
we know the laws that are around these
things and we've got to respect those
laws when we are using systems and data
that come that are under IP or
icip the establishment of permissions
rights and privacy rules around data
which we've mentioned before once again
to limit who can view data within
systems while we can all work for the
same Enterprise we shouldn't all have
access to all data of the Enterprise
that's why permissions and rights are
important to establish and then our
security tools for protecting our system
and our Network okay our login
procedures our use of Biometrics
encrypting data in transmited in storage
setting up a firewall for our Network a
whole variety of tools built to protect
our network from cyber security threats
specifically on the legal aspects of
data to we need to know existing
legislations in place such as the
Privacy Act 1988 and those principles
that surround it okay and then also if
we're unsure about things and we need
guidance who are the responsible
authorities we know the government but
then who within the government groups
such as the OIC who we contact in the
instance of a data breach things like
that okay that we need to know
specifically who to go to in instances
where there are concerns about data then
we also need to know about data
sovereignty of indigenous peoples and
how we support them and how data is used
in the context of their cultures and
their community and we still respect
their traditions and belief in how we
use that data to support
them okay and then we've got curated and
communicated data on social behavior
okay understanding things such as data
literacy how to actually specifically
understand data timelines of data and
how it is used okay signals and data
swamps and then educating users in this
area once again an area that I need to
look in more to get my own understanding
about it so that final point is relevant
to me too but there's some key terms
that are also very new to this course in
relation to data and social
behavior the final section is processing
and presenting data so data has been
processed turned into information and we
putting into a format that we can show
stakeholders clients or peers so that it
is ultimately comprehendable to them so
kind of the output of data that has been
digested in a way for people to
understand and here you're going to see
a lot more Yellow Boxes because it could
correlate the things that we could have
embedded into our assessment task so
first one is out of flat file databases
setting up a simple onetable database
that shows a variety of Records usually
related to one specific area that is
done using um a database package such as
Microsoft Access we also then have
spreadsheet summaries for the
correlation of information so this could
be as a user collating information
within the spreadsheets uh rows and
cells and all that but it could also be
that I've got a form on the front end uh
for the collection of data that I've
sent it out as a Google form and I've
shared it with a whole bunch of people
and then when they enter in their
responses it updates in the spreadsheet
okay and then from that spreadsheet I
can then develop things such as um
graphs and tables that summarize data
and make it more comprehendable which
then brings us to our next point of
filtering grouping and sorting data we
can use tools within the spreadsheet to
uh categorize our data and add filters
so we can look at specific data sets and
summarize data and focus on specific
groups we can link sheets with other
sheets and we can also make use of a
thing called conditional formatting
where specific values that meet certain
rules will be highlighted okay it could
be highlighted in red if certain value
is negative or highlighted green if a
certain value represents that a certain
area is doing well this help helps us
with data comparisons and then as said
before we can have forms acting as the
front end for our spreadsheet um
collecting data from a variety of
clients and users okay for our to
accumulate data within our spreadsheet
but then we could also have reports for
our summary that we put this all into a
formatted view to be printed off go all
sent out digitally that summarizes all
the information for our
stakeholders a very modern tool us this
day and this is also in conjunction with
spreadsheets is that of dashboards so
dashboards are like a very graphical
setup for a spreadsheet and in many
cases we actually get rid of the grid of
the spreadsheet so that big tabular
format kind of disappears and it's all
kind of text boxes and visualizations on
screen uh that are used to represent the
actual data so there will be a few
numbers on screen but it's more the
visualizations visualizations in the
forms of graphs but these graphs might
um change based on us entering different
data and data sets but that could be us
manually entering it we could also be
using things such as pivot tables and
slices so tables that will shrink and
enlarge based on what slices are active
so it could be that I have a specific
category of information you could think
of it as subjects at school and when I
click um English Advance only English
Advance students will appear in the
table and the marks allocated to them
but then if I also click English
Standard English Standard students and
English Advanced students will appear in
the table with their marks Al together
side by side so the table will adjust
depending on what slice of categories I
have switched on and off and then that
could also be linked to a graph that is
also adjusting accordingly and
representing metrics in a visualized
format visualization being key and
obviously visualization is now being
introduced here and that correlates with
our next unit of data visualization in
the Enterprise Computing year 12 course
we then have the design of a relational
database so these are the databases that
are larger than flatfile databases and
have multiple tables that we often refer
to as entities we create each of these
entities using a data dictionary that
allows us to establish metadata for each
of the entities what is the actual name
of the actual categories in these
entities which refer to as Fields okay
what data types are they made up giving
desri descriptions about it how long
will they be how much allocation of
memory will we give for each one we
provide examples of data and describe
the data they are all categories
included in a data dictionary as said we
use multiple entities to make a
relational database but we connect them
through relationships through primary
and foreign Keys each actual entity
needs to have a primary key which is its
main key usually an ID field that is a
specific number format and then we can
drag that over to as a foreign key okay
the exact same number to another entity
to establish that relationship once we
have these relational bases databases
set up we can search them and sort them
and one uh very fundamental way of doing
that is using SQL structured query
language where we have a series of
keywords used for selecting different
fields and extracting it from specific
tables and then applying a condition
using the wear keyword making use of
operators to say if data is greater than
less than equal to or combining criteria
together using and and or a whole
variety of tools for searching and
sorting within a relationship database
but also there's things such as QBE
within um modern database Management
systems that can do all this for us
using interfaces but we're still going
to know SQL because we're going to be
doing this in HSC and we can't use
software in the HSC we've got to do it
with our minds writing out the specific
code and then we we mentioned them
before forms and reports um in relations
to filing grouping sorting data well we
can set them up using um database
Management Systems in a relational
database for collecting data and
displaying data at both the front end of
collection and at the back end of
displaying information the final thing
about this unit is that of machine
learning and statistical modeling and
obviously very modern these days and
obviously a new part of the course in
that we now have systems with neural
networks that can learn themselves so
they accumulate all this data they
interpret all this data and then they
give us feedback and present the
visualization itself present the
statistics to us in a formatted View
summarizing it for us makinging our life
a lot more easier because it is
providing because one of the whole
themes of this unit that you've seen
with data science is how much data we
are collecting now okay terabytes of
data exobytes of data now okay data
amounts that we can't comprehend and
these larger Enterprise systems are
doing them daily the amount of data
think about how much data Google gets in
a day so if we can have machine learning
supporting us in this processing and
then giving us its output in a
statistical format in a model that we
can understand because it's a good
summary of that data that is of great
benefit to us as humans so I hope this
video has giv you an understanding of
this first unit of data science a lot of
new technical terms in this unit and
essentially the purpose of the unit
understanding the foundations of data in
how we collect it how it is made how we
store it how we analyze it and
essentially how it is of data quality
how it is of quality to us it is meaning
meaningful to us in our operations so we
need to understand and be able to
comprehend it as said this unit kind of
then stems into the second unit of data
visualizations where we start turning
data into a format that is
comprehendible and usable and thus
meaningful to present to people who
aren't as educated in Computing and in
data so they can understand it and use
it for their purposes but we'll get into
that when we do our next mind map on
data visualizations but hopefully at
this point you understand what data
science is all about for the Enterprise
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.