Reading in Data from a File
Example Book1.txt (Stored in C:\Users\…\Documents\Test1)
1→A→Small
2→B→Medium
3→C→Large
MyTable1=readtable('C:\Users\...\Documents\Test1\Book1.txt')
This file uses the tab [→] delimiter (the tab key is used to represent that one moves to the next column). This can be specified.
MyTable1=readtable('C:\Users\...\Documents\Test1\Book1.txt','Delimiter','\t')
Example Book2.csv (Stored in C:\Users\…\Documents\Test1)
1,A,Small
2,B,Medium
3,C,Large
MyTable1=readtable('C:\Users\...\Documents\Test1\Book2.csv')
This file uses the comma [,] delimiter (the , is used to represent that one moves to the next column).This can be specified.
MyTable1=readtable('C:\Users\...\Documents\Test1\Book2.csv','Delimiter',',')
Example Book3.txt (Stored in C:\Users\…\Documents\Test1)
Number→Letter→Size
1→A→Small
2→B→Medium
3→C→Large
MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t')
The Table consists of a series of Column Vectors. Each Column is treated as a variable and has a VariableName. If the first line is of a different form to the rest, it will be used to assign the Variable Names. These are usually read in by default, if not you can add the additional arguments:
MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t','ReadVariableNames',1)
Alternatively if you don't want to read these in:
MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t','ReadVariableNames',0)
These can also be changed once the table is read in:
MyTable3.Properties.VariableNames={'Num','Let','Size'}
Example Book4.txt (Stored in C:\Users\…\Documents\Test1)
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
1→A→Small
2→B→Medium
3→C→Large
This Table has 1 HeaderLine it can be removed when the table is read in by using the additional arguments
MyTable4=readtable('C:\Users\...\Documents\Test1\Book4.txt','Delimiter','\t','HeaderLine',1)
Example Book5.txt (Stored in C:\Users\…\Documents\Test1)
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large
This Table has Comments denoted by %% they can be removed when the table is read in by using the additional arguments
MyTable5=readtable('C:\Users\...\Documents\Test1\Book5.txt','Delimiter','\t','HeaderLine',1,'CommentStyle','%%')
A Column (Variable) may be selected from a table using Dot Notation.
MyTable5.Number
MyTable5.Letter
MyTable5.Size
The last Column may be changed to categorical:
MyTable5.Size=categorical(MyTable5.Size)
categories(MyTable5.Size)
The 2nd Column may be deleted using:
MyTable5.Letter=[]
Reading in Data from Multiple Files
Suppose with have a folder C:\Users\…\Documents\Test2
And in this folder we have 3 files:
Book1.txt
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large
Book2.txt
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 4 is the fourth number
4→A→small
%% 5 is the fifth number
5→B→medium
%% 6 is the sixth number
6→C→large
Book3.txt
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 7 is the seventh number
7→A→Small
%% 8 is the eight number
8→B→Medium
%% 9 is the ninth number
9→C→large
One can create a datastore to read multiple files within a folder.
MyDataStore=datastore('C:\Users\...\Documents\Test2')
To preview the contents of the datastore (the first file) we can use:
preview(MyDataStore)
We have 1 HeaderLine in the preview.
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large
We can specify this by using dot notation to change MyDataStore Properties:
MyDataStore.NumHeaderLines=1
To preview the contents of the MyDataStore (the first file) we can once again use:
preview(MyDataStore)
We have lines with comments
This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large
We can specify these once again by using dot notation to change MyDataStore Properties:
MyDataStore.CommentStyle='%%'
We are using the tab [→] as the delimiter however if we wanted to change it to a , we could specify the delimiter using:
MyDataStore.delimiter=','
To change the delimiter back to a tab [→] we can use:
MyDataStore.delimiter='\t'
We can also view the formats of each variable (column) using
MyDataStore.TextscanFormats
%f is numeric
%q is non-numeric
%C is Categorical
Okay now we are happy with the settings of MyDataStore we can read the first file in MyDataStore which is Book1.txt and assign it to Data1 using:
Data1=read(MyDataStore)
We can read the second file in MyDataStore which is Book2.txt and assign it to Data2 using:
Data2=read(MyDataStore)
We can read the third file in MyDataStore which is Book3.txt and assign it to Data3 using:
Data3=read(MyDataStore)
Alternatively we can read the entire contents of MyDataStore and assign it to AllData using:
AllData=readall(MyDataStore)
We can once again use dot notation to look at each Variable (Column) of the Table AllData.
AllData.Number
AllData.Letter
AllData.Size
Going back to MyDataStore if we wanted to change the third column so it is categorical we can use
MyDataStore.TextscanFormats{3}='%C'
If we also want to only select the first and third variables (columns) in MyDataStore we can use:
MyDataStore.SelectedVariableNames
={'Number','Size'}
AllData2=readall(MyDataStore)
This time only the selected columns will be read and the third variable (column) will be categorical.
Going back to AllData we are interested in the variable (column) Size and as we didn't change the third column to be categorical before using readall we will create a categorical version of it as a new variable (column). To do this we can use:
AllData.SizeCats=AllData.Size
Now we can use:
AllData.SizeCats=categorical(AllData.SizeCats)
Now we can view the categories using:
categories(AllData.SizeCats)
Although we want only 3 categories we will find that we actually have 6 due to upper and lower case versions of each category being listed individually:
- "Small" and "small"
- "Medium" and "medium"
- "Large" and "large"
To rectify this we can use:
AllData.SizeCats=mergecats(AllData.SizeCats{'small','Small'},'small')
AllData.SizeCats=mergecats(AllData.SizeCats{'medium','Medium'},'medium')
AllData.SizeCats=mergecats(AllData.SizeCats{'large','Large'},'large')
We can once again view the categories using:
categories(AllData.SizeCats)
Now we have only 3 categories:
- "small"
- "medium"
- "large"
Say we were indecisive and after changing all the categories to lower case we decided we instead we wanted the upper case values, we can also use:
AllData.SizeCats=renamecats(AllData.SizeCats{'small','medium','large'},{'Small','Medium','Large'})
We can create a new table which sorts the data by the variable (column) SizeCats
AllDataSorted=sortrows(AllData,'SizeCats')
This lists the column alphabetically:
- "Large"
- "Medium"
- "Small"
If we want to instead list by the size from smallest to largest we can redefine the categories as ordinal
AllData.SizeCats=categorical(AllData.SizeCats,{'Small','Medium','Large'},'ordinal',1)
Now we can create:
AllDataSorted2=sortrows(AllData,'SizeCats')
This lists the column ordinally:
- "Small"
- "Medium"
- "Large"
We can remove the unnecessary column Size using:
AllData.Size=[]
We can then write this table to file to a text file processeddata.txt in the folder (C:\Users\…\Documents\Test2):
writetable(AllData,'C:\Users\...\Documents\Test2\processeddata.txt')
If we want an excel file instead we can use:
writetable(AllData,'C:\Users\...\Documents\Test2\processeddata.xlsx')
If we want to select all the categories that are not Small we can use:
NotSmallCatArray=setdiff(AllData.SizeCats,'Small')
To convert to the categorical array to a cell array of characters we can use:
NotSmallCellArray=cellstr(NotSmallCatArray)
AllData.SizeCats=mergecats(AllData.SizeCats,NotSmallCellArray,'not small')
Discretize
Supposing we have the following table of Student Test Results:
And we have the following criteria:
StudentScore.Marks=readtable('...')
Levels=[0;50;55;70;80;100]
Grades={'F';'D';'C';'B';'A'}
StudentScore.Grades=discretize(StudentScore.Marks,Levels,'categorical',Grades)
Groups
a=[1;2;3;4;5;6;7;8;9;10]
b={'small';'small';'small';'medium';'medium';'medium';'medium';'large';'large';'large'}
c={'a';'a';'b';'b';'c';'c';'c';'c';'d';'d'}
MyTable=table(a,b,c)
[MyGroupsNums,MyGroupsValuesB,MyGroupsValuesC]=findgroups(MyTable.b,MyTable.c)
MyTable.b=categorical(MyTable.b,{'small','medium','large'},'ordinal',1)
[MyGroupsNums,MyGroupsValuesB,MyGroupsValuesC]=findgroups(MyTable.b,MyTable.c)
A
The function is equal can be used to find if two variables a and b are equal:
c=isequal(a,b)
This function can be used to to determine if each value in the input vector d is a number or not. Each index of d which is not a number gives a logical value of 1 in the output vector e which has the same dimensions. If th index of d is not, not a number i.e. is a number then the corresponding index in e is given a logical value of 0:
e=isnan(d)
The function number of non-zero values can be used to determine the number of non-zero values in f and set this value to the output g:
g=nnz(f)
Merging Data
join
innerjoin
outerjoin
Split Apply
splitapply(@func,Data,Group)
mean max min mode
accumarray([Group1,Group2],Data,[],@func)
Merge Table
xtickangle(45)