MATLAB: Reading Data from Files

Reading in Data from a File

Example Book1.txt (Stored in C:\Users\…\Documents\Test1)

1ASmall
2BMedium
3CLarge

MyTable1=readtable('C:\Users\...\Documents\Test1\Book1.txt')

This file uses the tab [→] delimiter (the tab key is used to represent that one moves to the next column). This can be specified.

MyTable1=readtable('C:\Users\...\Documents\Test1\Book1.txt','Delimiter','\t')

Example Book2.csv (Stored in C:\Users\…\Documents\Test1)

1,A,Small
2,B,Medium
3,C,Large

MyTable1=readtable('C:\Users\...\Documents\Test1\Book2.csv')

This file uses the comma [,] delimiter (the , is used to represent that one moves to the next column).This can be specified.

MyTable1=readtable('C:\Users\...\Documents\Test1\Book2.csv','Delimiter',',')

Example Book3.txt (Stored in C:\Users\…\Documents\Test1)

Number→Letter→Size
1→A→Small
2→B→Medium
3→C→Large

MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t')

The Table consists of a series of Column Vectors. Each Column is treated as a variable and has a VariableName. If the first line is of a different form to the rest, it will be used to assign the Variable Names. These are usually read in by default, if not you can add the additional arguments:

MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t','ReadVariableNames',1)

Alternatively if you don’t want to read these in:

MyTable3=readtable('C:\Users\...\Documents\Test1\Book3.txt','Delimiter','\t','ReadVariableNames',0)

These can also be changed once the table is read in:

MyTable3.Properties.VariableNames={'Num','Let','Size'}

Example Book4.txt (Stored in C:\Users\…\Documents\Test1)

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
1→A→Small
2→B→Medium
3→C→Large

This Table has 1 HeaderLine it can be removed when the table is read in by using the additional arguments

MyTable4=readtable('C:\Users\...\Documents\Test1\Book4.txt','Delimiter','\t','HeaderLine',1)

Example Book5.txt (Stored in C:\Users\…\Documents\Test1)

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large

This Table has Comments denoted by %% they can be removed when the table is read in by using the additional arguments

MyTable5=readtable('C:\Users\...\Documents\Test1\Book5.txt','Delimiter','\t','HeaderLine',1,'CommentStyle','%%')

A Column (Variable) may be selected from a table using Dot Notation.

MyTable5.Number
MyTable5.Letter
MyTable5.Size

The last Column may be changed to categorical:

MyTable5.Size=categorical(MyTable5.Size)

categories(MyTable5.Size)

The 2nd Column may be deleted using:

MyTable5.Letter=[]

Reading in Data from Multiple Files

Suppose with have a folder C:\Users\…\Documents\Test2

And in this folder we have 3 files:

Book1.txt

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large

Book2.txt

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 4 is the fourth number
4→A→small
%% 5 is the fifth number
5→B→medium
%% 6 is the sixth number
6→C→large

Book3.txt

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 7 is the seventh number
7→A→Small
%% 8 is the eight number
8→B→Medium
%% 9 is the ninth number
9→C→large

One can create a datastore to read multiple files within a folder.

MyDataStore=datastore('C:\Users\...\Documents\Test2')

To preview the contents of the datastore (the first file) we can use:

preview(MyDataStore)

We have 1 HeaderLine in the preview.

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large

We can specify this by using dot notation to change MyDataStore Properties:

MyDataStore.NumHeaderLines=1

To preview the contents of the MyDataStore (the first file) we can once again use:

preview(MyDataStore)

We have lines with comments

This File was Created by Philip Yip on the 27/01/2019
Number→Letter→Size
%% 1 is the first number
1→A→Small
%% 2 is the second number
2→B→Medium
%% 3 is the third number
3→C→Large

We can specify these once again by using dot notation to change MyDataStore Properties:

MyDataStore.CommentStyle='%%'

We are using the tab [→] as the delimiter however if we wanted to change it to a , we could specify the delimiter using:

MyDataStore.delimiter=','

To change the delimiter back to a tab [] we can use:

MyDataStore.delimiter='\t'

We can also view the formats of each variable (column) using

MyDataStore.TextscanFormats

%f is numeric

%q is non-numeric

%C is Categorical

Okay now we are happy with the settings of MyDataStore we can read the first file in MyDataStore which is Book1.txt and assign it to Data1 using:

Data1=read(MyDataStore)

We can read the second file in MyDataStore which is Book2.txt and assign it to Data2 using:

Data2=read(MyDataStore)

We can read the third file in MyDataStore which is Book3.txt and assign it to Data3 using:

Data3=read(MyDataStore)

Alternatively we can read the entire contents of MyDataStore and assign it to AllData using:

AllData=readall(MyDataStore)

We can once again use dot notation to look at each Variable (Column) of the Table AllData.

AllData.Number

AllData.Letter

AllData.Size

Going back to MyDataStore if we wanted to change the third column so it is categorical we can use

MyDataStore.TextscanFormats{3}='%C'

If we also want to only select the first and third variables (columns) in MyDataStore we can use:

MyDataStore.SelectedVariableNames={'Number','Size'}

AllData2=readall(MyDataStore)

This time only the selected columns will be read and the third variable (column) will be categorical.

Going back to AllData we are interested in the variable (column) Size and as we didn’t change the third column to be categorical before using readall we will create a categorical version of it as a new variable (column). To do this we can use:

AllData.SizeCats=AllData.Size

Now we can use:

AllData.SizeCats=categorical(AllData.SizeCats)

Now we can view the categories using:

categories(AllData.SizeCats)

Although we want only 3 categories we will find that we actually have 6 due to upper and lower case versions of each category being listed individually:

  • “Small” and “small”
  • “Medium” and “medium”
  • “Large” and “large”

To rectify this we can use:

AllData.SizeCats=mergecats(AllData.SizeCats{'small','Small'},'small')
AllData.SizeCats=mergecats(AllData.SizeCats{'medium','Medium'},'medium')
AllData.SizeCats=mergecats(AllData.SizeCats{'large','Large'},'large')
We can once again view the categories using:

categories(AllData.SizeCats)

Now we have only 3 categories:

  • “small”
  • “medium”
  • “large”

Say we were indecisive and after changing all the categories to lower case we decided we instead we wanted the upper case values, we can also use:

AllData.SizeCats=renamecats(AllData.SizeCats{'small','medium','large'},{'Small','Medium','Large'})

We can create a new table which sorts the data by the variable (column) SizeCats

AllDataSorted=sortrows(AllData,'SizeCats')

This lists the column alphabetically:

  • “Large”
  • “Medium”
  • “Small”

If we want to instead list by the size from smallest to largest we can redefine the categories as ordinal

AllData.SizeCats=categorical(AllData.SizeCats,{'Small','Medium','Large'},'ordinal',1)

Now we can create:

AllDataSorted2=sortrows(AllData,'SizeCats')

This lists the column ordinally:

  • “Small”
  • “Medium”
  • “Large”

We can remove the unnecessary column Size using:

AllData.Size=[]

We can then write this table to file to a text file processeddata.txt in the folder (C:\Users\…\Documents\Test2):

writetable(AllData,'C:\Users\...\Documents\Test2\processeddata.txt')

If we want an excel file instead we can use:

writetable(AllData,'C:\Users\...\Documents\Test2\processeddata.xlsx')

If we want to select all the categories that are not Small we can use:

NotSmallCatArray=setdiff(AllData.SizeCats,'Small')

To convert to the categorical array to a cell array of characters we can use:

NotSmallCellArray=cellstr(NotSmallCatArray)

AllData.SizeCats=mergecats(AllData.SizeCats,NotSmallCellArray,'not small')

Discretize

Supposing we have the following table of Student Test Results:

And we have the following criteria:

StudentScore.Marks=readtable('...')

Levels=[0;50;55;70;80;100]

Grades={'F';'D';'C';'B';'A'}

StudentScore.Grades=discretize(StudentScore.Marks,Levels,'categorical',Grades)

Groups

a=[1;2;3;4;5;6;7;8;9;10]

b={‘small’;’small’;’small’;’medium’;’medium’;’medium’;’medium’;’large’;’large’;’large’}

c={‘a’;’a’;’b’;’b’;’c’;’c’;’c’;’c’;’d’;’d’}

MyTable=table(a,b,c)

[MyGroupsNums,MyGroupsValuesB,MyGroupsValuesC]=findgroups(MyTable.b,MyTable.c)

MyTable.b=categorical(MyTable.b,{'small','medium','large'},'ordinal',1)

[MyGroupsNums,MyGroupsValuesB,MyGroupsValuesC]=findgroups(MyTable.b,MyTable.c)

 

 

A

The function is equal can be used to find if two variables a and b are equal:

c=isequal(a,b)

This function can be used to to determine if each value in the input vector d is a number or not. Each index of d which is not a number gives a logical value of 1 in the output vector e which has the same dimensions. If th index of d is not, not a number i.e. is a number then the corresponding index in e is given a logical value of 0:

e=isnan(d)

The function number of non-zero values can be used to determine the number of non-zero values in f and set this value to the output g:

g=nnz(f)

 

Merging Data

join

innerjoin

outerjoin

 

Split Apply

splitapply(@func,Data,Group)

mean max min mode

 

accumarray([Group1,Group2],Data,[],@func)

 

Merge Table

hurrs = readtable(‘hurricaneData1990s.txt’);
mnth = month(hurrs.Timestamp,‘name’);
% Group the values in mnth.
[gNum1,gName1] = findgroups(mnth);
% TODO – Find the average wind speed (hurrs.Windspeed)
% by month
avgWS=splitapply(@mean,hurrs.Windspeed,gNum1)
bar(avgWS)
set(gca,‘xticklabel’,gName1)
gName1=monthNum2Name(gName1)
xticklabels(gName1)
xtickangle(45)
Advertisements