Data Visualization - Line Plotting¶

Preparing the enviroment¶

import numpy as np
import pandas as pd
import openpyxl # used to read excel files

Enter the url where your dataset is:¶

URL = 'https://github.com/mmouty/energynds/blob/main/energynds/assets/datasets/UN_MigFlow_All_CountryFiles.zip'

fetching the excel file:¶

df_can = pd.read_excel(
    URL,
    sheet_name='Canada by Citizenship',
    skiprows=range(20),
    skipfooter=2,
    engine='openpyxl')
print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!

df_can.head() #Just checking the data

Let's get basic information about our dataframe:¶

df_can.info(verbose=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1013 entries, 0 to 1012
Columns: 51 entries, Type to Unnamed: 50
dtypes: float64(45), object(6)
memory usage: 403.7+ KB

Let's get the list of column headers¶

df_can.columns

Index([       'Type',    'Coverage',      'OdName',        'AREA',
          'AreaName',         'REG',     'RegName',         'DEV',
           'DevName',          1980,          1981,          1982,
                1983,          1984,          1985,          1986,
                1987,          1988,          1989,          1990,
                1991,          1992,          1993,          1994,
                1995,          1996,          1997,          1998,
                1999,          2000,          2001,          2002,
                2003,          2004,          2005,          2006,
                2007,          2008,          2009,          2010,
                2011,          2012,          2013, 'Unnamed: 43',
       'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
       'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50'],
      dtype='object')

Let's get the list of indicies:¶

df_can.index

RangeIndex(start=0, stop=1013, step=1)

Let's check the type of the columns and index:¶

print(type(df_can.columns))
print(type(df_can.index))

<class 'pandas.core.indexes.base.Index'>
<class 'pandas.core.indexes.range.RangeIndex'>

The default type of intance variables index and columns are not list, so let's change them into list:¶

df_can.columns.tolist()

['Type',
 'Coverage',
 'OdName',
 'AREA',
 'AreaName',
 'REG',
 'RegName',
 'DEV',
 'DevName',
 1980,
 1981,
 1982,
 1983,
 1984,
 1985,
 1986,
 1987,
 1988,
 1989,
 1990,
 1991,
 1992,
 1993,
 1994,
 1995,
 1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 'Unnamed: 43',
 'Unnamed: 44',
 'Unnamed: 45',
 'Unnamed: 46',
 'Unnamed: 47',
 'Unnamed: 48',
 'Unnamed: 49',
 'Unnamed: 50']

df_can.index.tolist()

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 188,
 189,
 190,
 191,
 192,
 193,
 194,
 195,
 196,
 197,
 198,
 199,
 200,
 201,
 202,
 203,
 204,
 205,
 206,
 207,
 208,
 209,
 210,
 211,
 212,
 213,
 214,
 215,
 216,
 217,
 218,
 219,
 220,
 221,
 222,
 223,
 224,
 225,
 226,
 227,
 228,
 229,
 230,
 231,
 232,
 233,
 234,
 235,
 236,
 237,
 238,
 239,
 240,
 241,
 242,
 243,
 244,
 245,
 246,
 247,
 248,
 249,
 250,
 251,
 252,
 253,
 254,
 255,
 256,
 257,
 258,
 259,
 260,
 261,
 262,
 263,
 264,
 265,
 266,
 267,
 268,
 269,
 270,
 271,
 272,
 273,
 274,
 275,
 276,
 277,
 278,
 279,
 280,
 281,
 282,
 283,
 284,
 285,
 286,
 287,
 288,
 289,
 290,
 291,
 292,
 293,
 294,
 295,
 296,
 297,
 298,
 299,
 300,
 301,
 302,
 303,
 304,
 305,
 306,
 307,
 308,
 309,
 310,
 311,
 312,
 313,
 314,
 315,
 316,
 317,
 318,
 319,
 320,
 321,
 322,
 323,
 324,
 325,
 326,
 327,
 328,
 329,
 330,
 331,
 332,
 333,
 334,
 335,
 336,
 337,
 338,
 339,
 340,
 341,
 342,
 343,
 344,
 345,
 346,
 347,
 348,
 349,
 350,
 351,
 352,
 353,
 354,
 355,
 356,
 357,
 358,
 359,
 360,
 361,
 362,
 363,
 364,
 365,
 366,
 367,
 368,
 369,
 370,
 371,
 372,
 373,
 374,
 375,
 376,
 377,
 378,
 379,
 380,
 381,
 382,
 383,
 384,
 385,
 386,
 387,
 388,
 389,
 390,
 391,
 392,
 393,
 394,
 395,
 396,
 397,
 398,
 399,
 400,
 401,
 402,
 403,
 404,
 405,
 406,
 407,
 408,
 409,
 410,
 411,
 412,
 413,
 414,
 415,
 416,
 417,
 418,
 419,
 420,
 421,
 422,
 423,
 424,
 425,
 426,
 427,
 428,
 429,
 430,
 431,
 432,
 433,
 434,
 435,
 436,
 437,
 438,
 439,
 440,
 441,
 442,
 443,
 444,
 445,
 446,
 447,
 448,
 449,
 450,
 451,
 452,
 453,
 454,
 455,
 456,
 457,
 458,
 459,
 460,
 461,
 462,
 463,
 464,
 465,
 466,
 467,
 468,
 469,
 470,
 471,
 472,
 473,
 474,
 475,
 476,
 477,
 478,
 479,
 480,
 481,
 482,
 483,
 484,
 485,
 486,
 487,
 488,
 489,
 490,
 491,
 492,
 493,
 494,
 495,
 496,
 497,
 498,
 499,
 500,
 501,
 502,
 503,
 504,
 505,
 506,
 507,
 508,
 509,
 510,
 511,
 512,
 513,
 514,
 515,
 516,
 517,
 518,
 519,
 520,
 521,
 522,
 523,
 524,
 525,
 526,
 527,
 528,
 529,
 530,
 531,
 532,
 533,
 534,
 535,
 536,
 537,
 538,
 539,
 540,
 541,
 542,
 543,
 544,
 545,
 546,
 547,
 548,
 549,
 550,
 551,
 552,
 553,
 554,
 555,
 556,
 557,
 558,
 559,
 560,
 561,
 562,
 563,
 564,
 565,
 566,
 567,
 568,
 569,
 570,
 571,
 572,
 573,
 574,
 575,
 576,
 577,
 578,
 579,
 580,
 581,
 582,
 583,
 584,
 585,
 586,
 587,
 588,
 589,
 590,
 591,
 592,
 593,
 594,
 595,
 596,
 597,
 598,
 599,
 600,
 601,
 602,
 603,
 604,
 605,
 606,
 607,
 608,
 609,
 610,
 611,
 612,
 613,
 614,
 615,
 616,
 617,
 618,
 619,
 620,
 621,
 622,
 623,
 624,
 625,
 626,
 627,
 628,
 629,
 630,
 631,
 632,
 633,
 634,
 635,
 636,
 637,
 638,
 639,
 640,
 641,
 642,
 643,
 644,
 645,
 646,
 647,
 648,
 649,
 650,
 651,
 652,
 653,
 654,
 655,
 656,
 657,
 658,
 659,
 660,
 661,
 662,
 663,
 664,
 665,
 666,
 667,
 668,
 669,
 670,
 671,
 672,
 673,
 674,
 675,
 676,
 677,
 678,
 679,
 680,
 681,
 682,
 683,
 684,
 685,
 686,
 687,
 688,
 689,
 690,
 691,
 692,
 693,
 694,
 695,
 696,
 697,
 698,
 699,
 700,
 701,
 702,
 703,
 704,
 705,
 706,
 707,
 708,
 709,
 710,
 711,
 712,
 713,
 714,
 715,
 716,
 717,
 718,
 719,
 720,
 721,
 722,
 723,
 724,
 725,
 726,
 727,
 728,
 729,
 730,
 731,
 732,
 733,
 734,
 735,
 736,
 737,
 738,
 739,
 740,
 741,
 742,
 743,
 744,
 745,
 746,
 747,
 748,
 749,
 750,
 751,
 752,
 753,
 754,
 755,
 756,
 757,
 758,
 759,
 760,
 761,
 762,
 763,
 764,
 765,
 766,
 767,
 768,
 769,
 770,
 771,
 772,
 773,
 774,
 775,
 776,
 777,
 778,
 779,
 780,
 781,
 782,
 783,
 784,
 785,
 786,
 787,
 788,
 789,
 790,
 791,
 792,
 793,
 794,
 795,
 796,
 797,
 798,
 799,
 800,
 801,
 802,
 803,
 804,
 805,
 806,
 807,
 808,
 809,
 810,
 811,
 812,
 813,
 814,
 815,
 816,
 817,
 818,
 819,
 820,
 821,
 822,
 823,
 824,
 825,
 826,
 827,
 828,
 829,
 830,
 831,
 832,
 833,
 834,
 835,
 836,
 837,
 838,
 839,
 840,
 841,
 842,
 843,
 844,
 845,
 846,
 847,
 848,
 849,
 850,
 851,
 852,
 853,
 854,
 855,
 856,
 857,
 858,
 859,
 860,
 861,
 862,
 863,
 864,
 865,
 866,
 867,
 868,
 869,
 870,
 871,
 872,
 873,
 874,
 875,
 876,
 877,
 878,
 879,
 880,
 881,
 882,
 883,
 884,
 885,
 886,
 887,
 888,
 889,
 890,
 891,
 892,
 893,
 894,
 895,
 896,
 897,
 898,
 899,
 900,
 901,
 902,
 903,
 904,
 905,
 906,
 907,
 908,
 909,
 910,
 911,
 912,
 913,
 914,
 915,
 916,
 917,
 918,
 919,
 920,
 921,
 922,
 923,
 924,
 925,
 926,
 927,
 928,
 929,
 930,
 931,
 932,
 933,
 934,
 935,
 936,
 937,
 938,
 939,
 940,
 941,
 942,
 943,
 944,
 945,
 946,
 947,
 948,
 949,
 950,
 951,
 952,
 953,
 954,
 955,
 956,
 957,
 958,
 959,
 960,
 961,
 962,
 963,
 964,
 965,
 966,
 967,
 968,
 969,
 970,
 971,
 972,
 973,
 974,
 975,
 976,
 977,
 978,
 979,
 980,
 981,
 982,
 983,
 984,
 985,
 986,
 987,
 988,
 989,
 990,
 991,
 992,
 993,
 994,
 995,
 996,
 997,
 998,
 999,
 ...]

Now, let's check again:¶

print(type(df_can.columns.tolist()))
print(type(df_can.index.tolist()))

<class 'list'>
<class 'list'>

Let's check the dimensions of our dataframe:¶

# size of dataframe (rows, columns)
df_can.shape

(1013, 51)

We are not going to use all of these columns, so let's drop some:¶

# in pandas axis=0 represents rows (default) and axis=1 represents columns.
df_can.drop(['AREA','REG','DEV','Type','Coverage','Unnamed: 43','Unnamed: 44','Unnamed: 45','Unnamed: 46','Unnamed: 47','Unnamed: 48','Unnamed: 49','Unnamed: 50'], axis=1, inplace=True)
df_can.head(2)

Let's rename the columns:¶

df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent', 'RegName':'Region'}, inplace=True)
df_can.columns

Index([  'Country', 'Continent',    'Region',   'DevName',        1980,
              1981,        1982,        1983,        1984,        1985,
              1986,        1987,        1988,        1989,        1990,
              1991,        1992,        1993,        1994,        1995,
              1996,        1997,        1998,        1999,        2000,
              2001,        2002,        2003,        2004,        2005,
              2006,        2007,        2008,        2009,        2010,
              2011,        2012,        2013],
      dtype='object')

Let's add a 'Total' column that sums up the total immigrants by country over the entire period 1980 - 2013¶

df_can['Total'] = df_can.sum(axis=1)

Let's see how many null objects we have in the dataset¶

df_can.isnull().sum()

Country      816
Continent    816
Region       816
DevName      816
1980         816
1981         816
1982         816
1983         816
1984         816
1985         816
1986         816
1987         816
1988         816
1989         816
1990         816
1991         816
1992         816
1993         816
1994         816
1995         816
1996         816
1997         816
1998         816
1999         816
2000         816
2001         816
2002         816
2003         816
2004         816
2005         816
2006         816
2007         816
2008         816
2009         816
2010         816
2011         816
2012         816
2013         816
Total          0
dtype: int64

Let's view a quick summary of each column in our dataframe:¶

df_can.describe()

Data Slicing:¶

Let's filter on the list of countries ('Country'):¶

df_can.Country  # returns a series

0          Afghanistan
1              Albania
2              Algeria
3       American Samoa
4              Andorra
             ...
1008               NaN
1009               NaN
1010               NaN
1011               NaN
1012               NaN
Name: Country, Length: 1013, dtype: object

Some values are NaN, so let's drop them:¶

df_can.dropna(subset=["Country"], axis=0, inplace=True)

df_can.Country

0         Afghanistan
1             Albania
2             Algeria
3      American Samoa
4             Andorra
            ...
192             Yemen
193            Zambia
194          Zimbabwe
195           Unknown
196             Total
Name: Country, Length: 197, dtype: object

Let's filter on the list of countries and the data for years: 1980 - 1985:¶

df_can[['Country', 1980, 1981, 1982, 1983, 1984, 1985]]

Notice that the default index of the dataset is a numeric range from 0 to 195. This makes it very difficult to do a query by a specific country. For example to search for data on Japan, we need to know the corresponding index value, so let's make the Country column as our index:¶

df_can.set_index('Country', inplace=True)

df_can.head(3)

Let's view the number of immigrants from Japan for the following scenarios:¶

1. The full row data (all columns)¶

df_can.loc['Japan']
# or you can use df_can.iloc[87]

Continent                 Asia
Region            Eastern Asia
DevName      Developed regions
1980                       701
1981                       756
1982                       598
1983                       309
1984                       246
1985                       198
1986                       248
1987                       422
1988                       324
1989                       494
1990                       379
1991                       506
1992                       605
1993                       907
1994                       956
1995                       826
1996                       994
1997                       924
1998                       897
1999                      1083
2000                      1010
2001                      1092
2002                       806
2003                       817
2004                       973
2005                      1067
2006                      1212
2007                      1250
2008                      1284
2009                      1194
2010                      1168
2011                      1265
2012                      1214
2013                       982
Total                    27707
Name: Japan, dtype: object

2. For year 2013¶

df_can.loc['Japan', 2013]
# or you can use df_can.iloc[87, 36]

982.0

3. For years 1980 to 1985¶

df_can.loc['Japan', [1980, 1981, 1982, 1983, 1984, 1984]]
# or you can use df_can.iloc[87, [3, 4, 5, 6, 7, 8]]

1980    701
1981    756
1982    598
1983    309
1984    246
1984    246
Name: Japan, dtype: object

To avoid this ambuigity, let's convert the column names into strings: '1980' to '2013':¶

df_can.columns = list(map(str, df_can.columns))

Now, let's declare a variable that will allow us to easily call upon the full range of years:¶

years = list(map(str, range(1980, 2014)))
years

['1980',
 '1981',
 '1982',
 '1983',
 '1984',
 '1985',
 '1986',
 '1987',
 '1988',
 '1989',
 '1990',
 '1991',
 '1992',
 '1993',
 '1994',
 '1995',
 '1996',
 '1997',
 '1998',
 '1999',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008',
 '2009',
 '2010',
 '2011',
 '2012',
 '2013']

Filtering based on criteria:¶

To filter the dataframe based on a condition, we pass the condition as a boolean vector.¶

For example, Let's filter the dataframe to show the data on Asian countries (AreaName = Asia).¶

# 1. create the condition boolean series
condition = df_can['Continent'] == 'Asia'
print(condition)

Country
Afghanistan        True
Albania           False
Algeria           False
American Samoa    False
Andorra           False
                  ...
Yemen              True
Zambia            False
Zimbabwe          False
Unknown           False
Total             False
Name: Continent, Length: 197, dtype: bool

# 2. pass this condition into the dataFrame
df_can[condition]

We can pass multiple criteria in the same line¶

Let's filter for AreaNAme = Asia and RegName = Southern Asia¶

df_can[(df_can['Continent']=='Asia') & (df_can['Region']=='Southern Asia')]

Let's review the changes we have made to our dataframe.¶

print('data dimensions:', df_can.shape)
print(df_can.columns)
df_can.head(2)

data dimensions: (197, 38)
Index(['Continent', 'Region', 'DevName', '1980', '1981', '1982', '1983',
       '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992',
       '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001',
       '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010',
       '2011', '2012', '2013', 'Total'],
      dtype='object')

Data Visualization:¶

Let's prepare the enviroment:¶

# we are using the inline backend
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

Now let's check if Matplotlib is loaded¶

print('Matplotlib version: ', mpl.__version__)  # >= 2.0.0

Matplotlib version:  3.4.3

Let's apply style:¶

print(plt.style.available)
mpl.style.use(['ggplot']) # optional: for ggplot-like style

['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']

Line Pots (Series/Dataframe)¶

Let's plot a line graph of immigration from Haiti¶

# First, we will extract the data series for Haiti:
haiti = df_can.loc['Haiti', years] # passing in years 1980 - 2013 to exclude the 'total' column
haiti.head()

1980    1666
1981    3692
1982    3498
1983    2860
1984    1418
Name: Haiti, dtype: object

haiti.plot()

<AxesSubplot:>

Let's change the index values of Haiti to type integer for plotting, and label the x and y axes¶

haiti.index = haiti.index.map(int) # let's change the index values of Haiti to type integer for plotting
haiti.plot(kind='line')

plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')

plt.show() # need this line to show the updates made to the figure

Let's annotate this spike in the plot "year 2010" (it was due to an earthquake in Haiti)¶

haiti.plot(kind='line')

plt.title('Immigration from Haiti')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')

# annotate the 2010 Earthquake. 
# syntax: plt.text(x, y, label)
plt.text(2000, 6000, '2010 Earthquake')

plt.show()

Let's compare the number of immigrants from India and China from 1980 to 2013:¶

df_CI = df_can.loc[['India', 'China'], years]
df_CI

# plotting a graph
df_CI.plot(kind='line')

<AxesSubplot:xlabel='Country'>

Not working, we need to transpose the x and y axis:¶

df_CI = df_CI.transpose()
df_CI.head()

# Now, let's try again
df_CI.index = df_CI.index.map(int) # let's change the index values of df_CI to type integer for plotting
df_CI.plot(kind='line')

plt.title('Immigrants from China and India')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')

plt.show()

Let's compare the trend of top 5 countries that contributed the most to immigration to Canada.¶

1.Get the dataset. Recall that we created a Total column that calculates cumulative immigration by country, we will sort on this column to get our top 5 countries using pandas sort_values() method.¶

inplace = True # paramemter saves the changes to the original df_can dataframe

df_can.sort_values(by='Total', ascending=False, axis=0, inplace=True)

# get the top 5 entries
df_top5 = df_can.head(5)

# transpose the dataframe
df_top5 = df_top5[years].transpose()

print(df_top5)

2. Plot the dataframe. To make the plot more readeable, we will change the size using the `figsize` parameter.¶

df_top5.index = df_top5.index.map(int) # let's change the index values of df_top5 to type integer for plotting
df_top5.plot(kind='line', figsize=(14, 8)) # pass a tuple (x, y) size

plt.title('Immigration Trend of Top 5 Countries')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')

plt.show()

	1980	1981	1982	1983	1984	1985	1986	1987	1988	1989	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	Total
count	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	...	197.00000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	197.000000	1.013000e+03
mean	1453.167513	1306.000000	1230.203046	905.431472	896.162437	856.304569	1008.639594	1543.908629	1640.456853	1944.670051	...	2662.35533	2554.720812	2403.583756	2510.091371	2560.101523	2849.614213	2525.360406	2618.304569	2629.654822	1.367070e+04
std	10784.524807	9449.373841	8864.905615	6503.149859	6452.570413	6155.858422	7205.630681	11045.558746	11809.252241	13939.908653	...	19104.65330	18261.461660	17129.490442	17908.071420	18276.881231	20386.174875	18082.496369	18763.159434	18838.224387	2.219784e+05
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.00000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00
25%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	1.000000	1.000000	1.000000	...	29.00000	26.000000	32.000000	32.000000	37.000000	42.000000	38.000000	43.000000	45.000000	0.000000e+00
50%	14.000000	10.000000	12.000000	12.000000	14.000000	17.000000	21.000000	33.000000	36.000000	45.000000	...	211.00000	230.000000	198.000000	205.000000	217.000000	216.000000	196.000000	235.000000	217.000000	0.000000e+00
75%	266.000000	299.000000	299.000000	197.000000	207.000000	202.000000	257.000000	467.000000	476.000000	538.000000	...	857.00000	856.000000	975.000000	975.000000	917.000000	953.000000	851.000000	799.000000	852.000000	0.000000e+00
max	143137.000000	128641.000000	121175.000000	89185.000000	88272.000000	84346.000000	99351.000000	152075.000000	161585.000000	191550.000000	...	262242.00000	251640.000000	236753.000000	247244.000000	252170.000000	280687.000000	248748.000000	257903.000000	259021.000000	6.924212e+06

	Continent	Region	DevName	1980	1981	1982	1983	1984	1985	1986	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	Total
Country
Afghanistan	Asia	Southern Asia	Developing regions	16.0	39.0	39.0	47.0	71.0	340.0	496.0	...	3436.0	3009.0	2652.0	2111.0	1746.0	1758.0	2203.0	2635.0	2004.0	58639.0
Albania	Europe	Southern Europe	Developed regions	1.0	0.0	0.0	0.0	0.0	0.0	1.0	...	1223.0	856.0	702.0	560.0	716.0	561.0	539.0	620.0	603.0	15699.0
Algeria	Africa	Northern Africa	Developing regions	80.0	67.0	71.0	69.0	63.0	44.0	69.0	...	3626.0	4807.0	3623.0	4005.0	5393.0	4752.0	4325.0	3774.0	4331.0	69439.0

	Continent	Region	DevName	1980	1981	1982	1983	1984	1985	1986	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	Total
Country
Afghanistan	Asia	Southern Asia	Developing regions	16.0	39.0	39.0	47.0	71.0	340.0	496.0	...	3436.0	3009.0	2652.0	2111.0	1746.0	1758.0	2203.0	2635.0	2004.0	58639.0
Armenia	Asia	Western Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	224.0	218.0	198.0	205.0	267.0	252.0	236.0	258.0	207.0	3310.0
Azerbaijan	Asia	Western Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	359.0	236.0	203.0	125.0	165.0	209.0	138.0	161.0	57.0	2649.0
Bahrain	Asia	Western Asia	Developing regions	0.0	2.0	1.0	1.0	1.0	3.0	0.0	...	12.0	12.0	22.0	9.0	35.0	28.0	21.0	39.0	32.0	475.0
Bangladesh	Asia	Southern Asia	Developing regions	83.0	84.0	86.0	81.0	98.0	92.0	486.0	...	4171.0	4014.0	2897.0	2939.0	2104.0	4721.0	2694.0	2640.0	3789.0	65568.0
Bhutan	Asia	Southern Asia	Developing regions	0.0	0.0	0.0	0.0	1.0	0.0	0.0	...	5.0	10.0	7.0	36.0	865.0	1464.0	1879.0	1075.0	487.0	5876.0
Brunei Darussalam	Asia	South-Eastern Asia	Developing regions	79.0	6.0	8.0	2.0	2.0	4.0	12.0	...	4.0	5.0	11.0	10.0	5.0	12.0	6.0	3.0	6.0	600.0
Cambodia	Asia	South-Eastern Asia	Developing regions	12.0	19.0	26.0	33.0	10.0	7.0	8.0	...	370.0	529.0	460.0	354.0	203.0	200.0	196.0	233.0	288.0	6538.0
China	Asia	Eastern Asia	Developing regions	5123.0	6682.0	3308.0	1863.0	1527.0	1816.0	1960.0	...	42584.0	33518.0	27642.0	30037.0	29622.0	30391.0	28502.0	33024.0	34129.0	659962.0
China, Hong Kong Special Administrative Region	Asia	Eastern Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	729.0	712.0	674.0	897.0	657.0	623.0	591.0	728.0	774.0	9327.0
China, Macao Special Administrative Region	Asia	Eastern Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	21.0	32.0	16.0	12.0	21.0	21.0	13.0	33.0	29.0	284.0
Cyprus	Asia	Western Asia	Developing regions	132.0	128.0	84.0	46.0	46.0	43.0	48.0	...	7.0	9.0	4.0	7.0	6.0	18.0	6.0	12.0	16.0	1126.0
Democratic People's Republic of Korea	Asia	Eastern Asia	Developing regions	1.0	1.0	3.0	1.0	4.0	3.0	0.0	...	14.0	10.0	7.0	19.0	11.0	45.0	97.0	66.0	17.0	388.0
Georgia	Asia	Western Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	114.0	125.0	132.0	112.0	128.0	126.0	139.0	147.0	125.0	2068.0
India	Asia	Southern Asia	Developing regions	8880.0	8670.0	8147.0	7338.0	5704.0	4211.0	7150.0	...	36210.0	33848.0	28742.0	28261.0	29456.0	34235.0	27509.0	30933.0	33087.0	691904.0
Indonesia	Asia	South-Eastern Asia	Developing regions	186.0	178.0	252.0	115.0	123.0	100.0	127.0	...	632.0	613.0	657.0	661.0	504.0	712.0	390.0	395.0	387.0	13150.0
Iran (Islamic Republic of)	Asia	Southern Asia	Developing regions	1172.0	1429.0	1822.0	1592.0	1977.0	1648.0	1794.0	...	5837.0	7480.0	6974.0	6475.0	6580.0	7477.0	7479.0	7534.0	11291.0	175923.0
Iraq	Asia	Western Asia	Developing regions	262.0	245.0	260.0	380.0	428.0	231.0	265.0	...	2226.0	1788.0	2406.0	3543.0	5450.0	5941.0	6196.0	4041.0	4918.0	69789.0
Israel	Asia	Western Asia	Developing regions	1403.0	1711.0	1334.0	541.0	446.0	680.0	1212.0	...	2446.0	2625.0	2401.0	2562.0	2316.0	2755.0	1970.0	2134.0	1945.0	66508.0
Japan	Asia	Eastern Asia	Developed regions	701.0	756.0	598.0	309.0	246.0	198.0	248.0	...	1067.0	1212.0	1250.0	1284.0	1194.0	1168.0	1265.0	1214.0	982.0	27707.0
Jordan	Asia	Western Asia	Developing regions	177.0	160.0	155.0	113.0	102.0	179.0	181.0	...	1940.0	1827.0	1421.0	1581.0	1235.0	1831.0	1635.0	1206.0	1255.0	35406.0
Kazakhstan	Asia	Central Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	506.0	408.0	436.0	394.0	431.0	377.0	381.0	462.0	348.0	8490.0
Kuwait	Asia	Western Asia	Developing regions	1.0	0.0	8.0	2.0	1.0	4.0	4.0	...	66.0	35.0	62.0	53.0	68.0	67.0	58.0	73.0	48.0	2025.0
Kyrgyzstan	Asia	Central Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	173.0	161.0	135.0	168.0	173.0	157.0	159.0	278.0	123.0	2353.0
Lao People's Democratic Republic	Asia	South-Eastern Asia	Developing regions	11.0	6.0	16.0	16.0	7.0	17.0	21.0	...	42.0	74.0	53.0	32.0	39.0	54.0	22.0	25.0	15.0	1089.0
Lebanon	Asia	Western Asia	Developing regions	1409.0	1119.0	1159.0	789.0	1253.0	1683.0	2576.0	...	3709.0	3802.0	3467.0	3566.0	3077.0	3432.0	3072.0	1614.0	2172.0	115359.0
Malaysia	Asia	South-Eastern Asia	Developing regions	786.0	816.0	813.0	448.0	384.0	374.0	425.0	...	593.0	580.0	600.0	658.0	640.0	802.0	409.0	358.0	204.0	24417.0
Maldives	Asia	Southern Asia	Developing regions	0.0	0.0	0.0	1.0	0.0	0.0	0.0	...	0.0	0.0	2.0	1.0	7.0	4.0	3.0	1.0	1.0	30.0
Mongolia	Asia	Eastern Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	59.0	64.0	82.0	59.0	118.0	169.0	103.0	68.0	99.0	952.0
Myanmar	Asia	South-Eastern Asia	Developing regions	80.0	62.0	46.0	31.0	41.0	23.0	18.0	...	210.0	953.0	1887.0	975.0	1153.0	556.0	368.0	193.0	262.0	9245.0
Nepal	Asia	Southern Asia	Developing regions	1.0	1.0	6.0	1.0	2.0	4.0	13.0	...	607.0	540.0	511.0	581.0	561.0	1392.0	1129.0	1185.0	1308.0	10222.0
Oman	Asia	Western Asia	Developing regions	0.0	0.0	0.0	8.0	0.0	0.0	0.0	...	14.0	18.0	16.0	10.0	7.0	14.0	10.0	13.0	11.0	224.0
Pakistan	Asia	Southern Asia	Developing regions	978.0	972.0	1201.0	900.0	668.0	514.0	691.0	...	14314.0	13127.0	10124.0	8994.0	7217.0	6811.0	7468.0	11227.0	12603.0	241600.0
Philippines	Asia	South-Eastern Asia	Developing regions	6051.0	5921.0	5249.0	4562.0	3801.0	3150.0	4166.0	...	18139.0	18400.0	19837.0	24887.0	28573.0	38617.0	36765.0	34315.0	29544.0	511391.0
Qatar	Asia	Western Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	1.0	...	11.0	2.0	5.0	9.0	6.0	18.0	3.0	14.0	6.0	157.0
Republic of Korea	Asia	Eastern Asia	Developing regions	1011.0	1456.0	1572.0	1081.0	847.0	962.0	1208.0	...	5832.0	6215.0	5920.0	7294.0	5874.0	5537.0	4588.0	5316.0	4509.0	142581.0
Saudi Arabia	Asia	Western Asia	Developing regions	0.0	0.0	1.0	4.0	1.0	2.0	5.0	...	198.0	252.0	188.0	249.0	246.0	330.0	278.0	286.0	267.0	3425.0
Singapore	Asia	South-Eastern Asia	Developing regions	241.0	301.0	337.0	169.0	128.0	139.0	205.0	...	392.0	298.0	690.0	734.0	366.0	805.0	219.0	146.0	141.0	14579.0
Sri Lanka	Asia	Southern Asia	Developing regions	185.0	371.0	290.0	197.0	1086.0	845.0	1838.0	...	4930.0	4714.0	4123.0	4756.0	4547.0	4422.0	3309.0	3338.0	2394.0	148358.0
State of Palestine	Asia	Western Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	453.0	627.0	441.0	481.0	400.0	654.0	555.0	533.0	462.0	6512.0
Syrian Arab Republic	Asia	Western Asia	Developing regions	315.0	419.0	409.0	269.0	264.0	385.0	493.0	...	1458.0	1145.0	1056.0	919.0	917.0	1039.0	1005.0	650.0	1009.0	31485.0
Tajikistan	Asia	Central Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	85.0	46.0	44.0	15.0	50.0	52.0	47.0	34.0	39.0	503.0
Thailand	Asia	South-Eastern Asia	Developing regions	56.0	53.0	113.0	65.0	82.0	66.0	78.0	...	575.0	500.0	487.0	519.0	512.0	499.0	396.0	296.0	400.0	9174.0
Turkey	Asia	Western Asia	Developing regions	481.0	874.0	706.0	280.0	338.0	202.0	257.0	...	2065.0	1638.0	1463.0	1122.0	1238.0	1492.0	1257.0	1068.0	729.0	31781.0
Turkmenistan	Asia	Central Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	40.0	26.0	37.0	13.0	20.0	30.0	20.0	20.0	14.0	310.0
United Arab Emirates	Asia	Western Asia	Developing regions	0.0	2.0	2.0	1.0	2.0	0.0	5.0	...	31.0	42.0	37.0	33.0	37.0	86.0	60.0	54.0	46.0	836.0
Uzbekistan	Asia	Central Asia	Developing regions	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	330.0	262.0	284.0	215.0	288.0	289.0	162.0	235.0	167.0	3368.0
Viet Nam	Asia	South-Eastern Asia	Developing regions	1191.0	1829.0	2162.0	3404.0	7583.0	5907.0	2741.0	...	1852.0	3153.0	2574.0	1784.0	2171.0	1942.0	1723.0	1731.0	2112.0	97146.0
Yemen	Asia	Western Asia	Developing regions	1.0	2.0	1.0	6.0	0.0	18.0	7.0	...	161.0	140.0	122.0	133.0	128.0	211.0	160.0	174.0	217.0	2985.0

	Continent	Region	DevName	1980	1981	1982	1983	1984	1985	1986	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	Total
Country
Afghanistan	Asia	Southern Asia	Developing regions	16.0	39.0	39.0	47.0	71.0	340.0	496.0	...	3436.0	3009.0	2652.0	2111.0	1746.0	1758.0	2203.0	2635.0	2004.0	58639.0
Bangladesh	Asia	Southern Asia	Developing regions	83.0	84.0	86.0	81.0	98.0	92.0	486.0	...	4171.0	4014.0	2897.0	2939.0	2104.0	4721.0	2694.0	2640.0	3789.0	65568.0
Bhutan	Asia	Southern Asia	Developing regions	0.0	0.0	0.0	0.0	1.0	0.0	0.0	...	5.0	10.0	7.0	36.0	865.0	1464.0	1879.0	1075.0	487.0	5876.0
India	Asia	Southern Asia	Developing regions	8880.0	8670.0	8147.0	7338.0	5704.0	4211.0	7150.0	...	36210.0	33848.0	28742.0	28261.0	29456.0	34235.0	27509.0	30933.0	33087.0	691904.0
Iran (Islamic Republic of)	Asia	Southern Asia	Developing regions	1172.0	1429.0	1822.0	1592.0	1977.0	1648.0	1794.0	...	5837.0	7480.0	6974.0	6475.0	6580.0	7477.0	7479.0	7534.0	11291.0	175923.0
Maldives	Asia	Southern Asia	Developing regions	0.0	0.0	0.0	1.0	0.0	0.0	0.0	...	0.0	0.0	2.0	1.0	7.0	4.0	3.0	1.0	1.0	30.0
Nepal	Asia	Southern Asia	Developing regions	1.0	1.0	6.0	1.0	2.0	4.0	13.0	...	607.0	540.0	511.0	581.0	561.0	1392.0	1129.0	1185.0	1308.0	10222.0
Pakistan	Asia	Southern Asia	Developing regions	978.0	972.0	1201.0	900.0	668.0	514.0	691.0	...	14314.0	13127.0	10124.0	8994.0	7217.0	6811.0	7468.0	11227.0	12603.0	241600.0
Sri Lanka	Asia	Southern Asia	Developing regions	185.0	371.0	290.0	197.0	1086.0	845.0	1838.0	...	4930.0	4714.0	4123.0	4756.0	4547.0	4422.0	3309.0	3338.0	2394.0	148358.0

	1980	1981	1982	1983	1984	1985	1986	1987	1988	1989	...	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013
Country
India	8880.0	8670.0	8147.0	7338.0	5704.0	4211.0	7150.0	10189.0	11522.0	10343.0	...	28235.0	36210.0	33848.0	28742.0	28261.0	29456.0	34235.0	27509.0	30933.0	33087.0
China	5123.0	6682.0	3308.0	1863.0	1527.0	1816.0	1960.0	2643.0	2758.0	4323.0	...	36619.0	42584.0	33518.0	27642.0	30037.0	29622.0	30391.0	28502.0	33024.0	34129.0

	Type	Coverage	OdName	AREA	AreaName	REG	RegName	DEV	DevName	1980	...	2012	2013	Unnamed: 43	Unnamed: 44	Unnamed: 45	Unnamed: 46	Unnamed: 47	Unnamed: 48	Unnamed: 49	Unnamed: 50
0	Immigrants	Foreigners	Afghanistan	935.0	Asia	5501.0	Southern Asia	902.0	Developing regions	16.0	...	2635.0	2004.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Immigrants	Foreigners	Albania	908.0	Europe	925.0	Southern Europe	901.0	Developed regions	1.0	...	620.0	603.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	Immigrants	Foreigners	Algeria	903.0	Africa	912.0	Northern Africa	902.0	Developing regions	80.0	...	3774.0	4331.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	Immigrants	Foreigners	American Samoa	909.0	Oceania	957.0	Polynesia	902.0	Developing regions	0.0	...	0.0	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	Immigrants	Foreigners	Andorra	908.0	Europe	925.0	Southern Europe	901.0	Developed regions	0.0	...	1.0	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Data Visualization - Line Plotting¶

Preparing the enviroment¶

Enter the url where your dataset is:¶

fetching the excel file:¶

Let's get basic information about our dataframe:¶

Let's get the list of column headers¶

Let's get the list of indicies:¶

Let's check the type of the columns and index:¶

The default type of intance variables index and columns are not list, so let's change them into list:¶

Now, let's check again:¶

Let's check the dimensions of our dataframe:¶

We are not going to use all of these columns, so let's drop some:¶

Let's rename the columns:¶

Let's add a 'Total' column that sums up the total immigrants by country over the entire period 1980 - 2013¶

Let's see how many null objects we have in the dataset¶

Let's view a quick summary of each column in our dataframe:¶

Data Slicing:¶

Let's filter on the list of countries ('Country'):¶

Some values are NaN, so let's drop them:¶

Let's filter on the list of countries and the data for years: 1980 - 1985:¶

Notice that the default index of the dataset is a numeric range from 0 to 195. This makes it very difficult to do a query by a specific country. For example to search for data on Japan, we need to know the corresponding index value, so let's make the Country column as our index:¶

Let's view the number of immigrants from Japan for the following scenarios:¶

1. The full row data (all columns)¶

2. For year 2013¶

3. For years 1980 to 1985¶

To avoid this ambuigity, let's convert the column names into strings: '1980' to '2013':¶

Now, let's declare a variable that will allow us to easily call upon the full range of years:¶

Filtering based on criteria:¶

To filter the dataframe based on a condition, we pass the condition as a boolean vector.¶

For example, Let's filter the dataframe to show the data on Asian countries (AreaName = Asia).¶

We can pass multiple criteria in the same line¶

Let's filter for AreaNAme = Asia and RegName = Southern Asia¶

Let's review the changes we have made to our dataframe.¶

Data Visualization:¶

Let's prepare the enviroment:¶

Now let's check if Matplotlib is loaded¶

Let's apply style:¶

Line Pots (Series/Dataframe)¶

Let's plot a line graph of immigration from Haiti¶

Let's change the index values of Haiti to type integer for plotting, and label the x and y axes¶

Let's annotate this spike in the plot "year 2010" (it was due to an earthquake in Haiti)¶

Let's compare the number of immigrants from India and China from 1980 to 2013:¶

Not working, we need to transpose the x and y axis:¶

Let's compare the trend of top 5 countries that contributed the most to immigration to Canada.¶

1.Get the dataset. Recall that we created a Total column that calculates cumulative immigration by country, we will sort on this column to get our top 5 countries using pandas sort_values() method.¶

2. Plot the dataframe. To make the plot more readeable, we will change the size using the figsize parameter.¶

2. Plot the dataframe. To make the plot more readeable, we will change the size using the `figsize` parameter.¶