|Year : 2022 | Volume
| Issue : 1 | Page : 70-76
How to conduct descriptive statistics online: A brief hands-on guide for biomedical researchers
Himel Mondal1, Sharada Mayee Swain2, Shaikat Mondal3
1 Department of Physiology, Fakir Mohan Medical College and Hospital, Balasore, Odisha, India
2 Department of Physiology, Hi-Tech Medical College and Hospital, Bhubaneswar, Odisha, India
3 Department of Physiology, Raiganj Government Medical College and Hospital, Raiganj, West Bengal, India
|Date of Submission||16-Oct-2021|
|Date of Acceptance||25-Oct-2021|
|Date of Web Publication||23-Mar-2022|
Department of Physiology, Fakir Mohan Medical College and Hospital, Balasore, Odisha
Source of Support: None, Conflict of Interest: None
Background: Descriptive statistics is the first step of data analysis. In biomedical researches, inferential statistical tests are invariably conducted after descriptive statistical tests for getting a summary of the data. Many resource-limited settings may not have dedicated software for carrying out these tests. Aim: This article aimed to provide a brief technical guide about the conduct of descriptive statistics with visualization that can be done without any dedicated statistical software package. Methods: We searched for online tools that provide free service for the conduct of descriptive statistics. The example data were fabricated for the conduct of the test online. The visualization of the data (i.e., figures) was explained in brief, wherever necessary. Results: We described the method to graph and summarize the data using a pie chart, frequency table, stem and leaf display, histogram, frequency polygon, box plot, bar chart, stacked bar chart, line graph, dot plot, central tendency, variance, quantile-quantile plot, scatter plot, and Venn diagram. All these tests and visualization were done online without any installed dedicated software package. Conclusion: This article provides a brief technical guide for conducting common descriptive statistical tests online. Researchers in any resource-limited settings may use these services to summarize and visualize the data online from public domain websites.
Keywords: Data analysis, descriptive statistics, research design, software, statistical analysis
|How to cite this article:|
Mondal H, Swain SM, Mondal S. How to conduct descriptive statistics online: A brief hands-on guide for biomedical researchers. Indian J Vasc Endovasc Surg 2022;9:70-6
|How to cite this URL:|
Mondal H, Swain SM, Mondal S. How to conduct descriptive statistics online: A brief hands-on guide for biomedical researchers. Indian J Vasc Endovasc Surg [serial online] 2022 [cited 2022 May 28];9:70-6. Available from: https://www.indjvascsurg.org/text.asp?2022/9/1/70/340492
| Introduction|| |
For biomedical research, data are collected from a sample to draw a conclusion about the population from where the sample was recruited. In the data analysis flow, a researcher first gets a summary (e.g., mean and variance) of the data obtained from the sample. This is known as descriptive statistics. When these data are further analyzed (e.g., comparing mean of two groups, analysis of variance) to conclude the population, it is known as inferential statistics. If we look at some of the published biomedical research, we would find that the inferential statistics are conducted after conducting descriptive statistics, and results are also presented with descriptive statistics such as percentage, mean, median, and standard deviation.
The first step of descriptive statistics is sorting the data into groups. Sometimes, sorted or grouped data are presented with figures to augment the understanding of the sample data. The next step is to summarize the data, commonly with the central tendency such as mean and standard deviation, median, and quartile range.
Although descriptive statistical tests for a small number of observations can be calculated manually, a large data need a calculator. There are several spreadsheet software packages (both free and paid) that may help to conduct some of the descriptive statistics. However, they do not provide a full range of descriptive statistical tests with presentable visual output. In addition, dedicated software packages (both free and paid) are available for the statistical analysis. However, there are some settings where novice researchers may not have access to these software packages.
With this context, in this article, we aimed to provide a brief technical guide on how anyone can conduct descriptive statistical tests online without any dedicated software. However, a computer connected to the internet is a prerequisite for the tests. We presume that this compilation would help novice researchers of resource-limited settings in carrying out descriptive statistics with ease.
| Methods|| |
This study does not involve any human or animal research participants. All the data were fabricated for the conduct of the tests online. The websites that we described in this study provide free service through public domain websites. Hence, no ethical clearance was obtained for this study.
The basic concept of variable
In [Figure 1], we present a part of a study report to show the types of variables. A variable is a characteristic that can be measured, either qualitatively or quantitatively. Qualitative variables are also called categorical variables. These are – dichotomous (2 categories), nominal (≥3 categories without any order), and ordinal (≥3 categories in order). Quantitative variables are also called numerical variables. These are– continuous (measured in continuous scale) and discrete (can only take some numerical values).,,,
The list of the websites that were used in this study is summarized in [Table 1]. For a single test, we described a single website. There may be the availability of multiple websites that offer a particular test. Readers are encouraged to find more for their interest. In contrast, some websites offer multiple tests for descriptive statistics. However, we tried to make the list diverse. We checked the websites and included those that provide the test without any registration or fees. High resolution image of the result can be found in supplementary file.
|Table 1: Common descriptive statistical test and websites offering free conduct of the tests|
Click here to view
Descriptive statistics and visualization
In the below section, we described how anyone can conduct the tests online and save the result for presenting it on the manuscript. All the websites were live when we wrote this article. However, we cannot guarantee the free availability of the websites forever.
You conducted a survey on the first preference of bibliographic database among a sample of 169 biomedical researchers. You would like to present the qualitative/categorical data in relative frequency in a pie chart.
- Go to https://www.statskingdom.com/chart-maker.html2
- Select the “Type” as “Pie Chart; write the title “Preference of bibliographic database;” write “Category” titles serially replacing the A, B, C, etc., (you can copy and paste the categories too); type the corresponding values in “Group-1.” Click on the “Calculate” button
- Click on “Save image” and the image would be opened on a new window. Now, right click on the image and save it from “Save image as” option.
The pie chart with percentage is shown in [Figure 2]a. Although the pie chart is visually appealing, its usage in presenting small data (e.g., number of male, female, and intersex) is not suggested as it can be expressed in number and percentage as text.
|Figure 2: Descriptive statistics visualization – (a) pie chart, (b) frequency table, (c) stem and leaf display, (d) histogram, (e) frequency polygon, (f) box plot, (g) bar chart, (h) stacked bar chart, and (i) line graph. High-resolution figures are available in supplementary file|
Click here to view
You recorded the age of 35 athletes in completed years. You would like to get the frequency distribution of your numerical/quantitative discrete data.
- Go to https://www.socscistatistics.com/descriptive/frequencydistribution/default.aspx
- Copy the numbers and paste it in the box. Click on the “Generate” button
- You need to take a screenshot as the website does not provide option to save the table as image. You can also change the number of classes according to your choice from the “Edit Frequency Table” option.
The frequency table is shown in [Figure 2]b. According to the class distribution, the highest number of athletes (10 [28.6%]) was in 23–27 years of age.
Stem and leaf display
You have recorded age of 30 research participants in completed years. The data set is not a large one; hence, it can be expressed in stem and leaf display.
- Go to https://www.calculatorsoup.com/calculators/statistics/stemleaf.php
- Copy the data and paste it in the box below “Enter Data Set.” Click on the “Calculate” button
- You need to take a screenshot of the result.
The stem and leaf plot is shown in [Figure 2]c. The stem (1, 2, 3, and 4) is multiplied by 10 and the leaf values are added to get the actual value. In the plot shown in [Figure 2]c, one participant was of 18 years, one was of 19 years, one was of 20 years, one was of 21 years; two was of 22 years, and so on.
You measured weight (in kg) of 20 research participants and you wanted to make a histogram with this data set.
- Go to https://www.aatbio. com/tools/online-histogram-maker
- Paste the data under “Data Entry;” click on “Process data.” Click on “Calculate histogram.” You may also click and rename the X axis, Y axis, and the title of the graph
- Right click on the histogram and click on the “Download Graph.” If it does not work, take a screenshot of the graph.
The histogram is shown in [Figure 2]d. The weight is plotted in X-axis, and the number of observation is shown in Y-axis. The histogram provides a rough idea about the distribution of the data.
You measured the heart rates (in beats per minute) of 17 research participants in three grades of exercises. You would like to check the comparative distribution shape (i.e., comparing three histograms) of the numerical continuous data.
- Go to https://www.socscistatistics.com/descriptive/polygon/default.aspx
- Copy each column of data and paste them in “Distribution 1,” “Distribution 2”, and “Distribution 3.” Click on the “Generate” button
- Right click on the image and use “Save image as” option to save the figure. You can also customize the axis name, series names and number of classes from “Edit Polygon” option.
From [Figure 2]e, distribution shape of heart rate in mild, moderate, and vigorous exercise can be observed. The frequency and polygon table details are also shown in the result page.
You measured body weight of 14 sedentary, 14 active, and 14 athlete research participants. You wanted to compare these three set of data with a box- and-whisker graph.
- Go to https://goodcalculators.com/box-plot-maker
- Copy data and paste in “Group 1,” “Group 2,” and “Group 3;” name “Y-axis Title” as “Weight (kg).” Click on the “Draw” button
- Click on “Save as Portable Network Graphics (.png) “ to save the box plot image.
[Figure 2]f shows the box plot of the weight of three groups. There is an option to “+Add Group” when you have more than three groups. In the box plot, the upper whisker (upper line) corresponds to the maximum value; the lower whisker corresponds to minimum value. The bold line in the box indicates median and the box itself indicates interquartile range (quartile 1 [25 percentile] to quartile 3 [75 percentile], below upwards). The round shape in the box indicates the mean and round shape outside the minimum or maximum range indicates outlier. In this example, there was no outlier.
You collected the number of publication of two authors in last 5 years in PubMed Central. You wanted a comparison bar chart of the publications.
- Go to https://www.statskingdom.com/chart-maker.html
- Keep the “Type” as “Bar Chart;” write the title in the “Title” box; copy and paste the years in “Category” column and values of first author in “Group-1” column; click on “Insert column” and paste the values of second author in “Group-2.” Click on the “Calculate” button. You can customize the chart by clicking on the “More option;” you can name the axis; fix the range; and change the colors.
- Click on “Save image”, the image would be shown, right click on the image and use “Save image as” to save the image.
The comparative numbers of publication of two authors are shown in [Figure 2]g. According to need, more columns can be added.
Stacked bar chart
You conducted a survey with 32 research participants on the knowledge about, attitude toward, and practice on COVID-19. The result was coded in three categories – correct, wrong, and equivocal. You wanted to make a stacked bar chart of the finding.
- Go to https://graphmaker.imageonline.co/stackedbarchart. php
- Write the “Line Names” as “Correct,” “Wrong,” and “Equivocal;” write the “Chart Title” as “Knowledge, Attitude, and Practice on COVID-19” and “X Axis” as “Responses;” edit the “Input chart parameters” according to the values (e. g.; Knowledge: 23, 12, 7); delete the unnecessary rows. The chart would be automatically generated.
- Click on the “Download-Chart” to download the image file of the chart.
The stacked bar chart is shown in [Figure 2]h. From the chart, comparative correct, wrong, and equivocal responses could be observed.
You collected data on the number of publication on Yoga and Acupuncture available in PubMed in the last 10 years. You wanted to compare the trend visually over time in this 10 year period.
- Go to https://www.rapidtables.com/tools/line-graph.html
- Write the “Graph title” as “Yoga and Acupuncture publication in last 10 years;” name the “Horizontal axis” as “Year” and “Vertical axis” as “Number of publication;” write the years in “Data labels” separated by comma and a space, select the “Number of lines” as “2 lines;” copy and paste the data of Yoga publication in “Line 1 data values” and Acupuncture publication number in “Line 2 data values;” you can make the line curved by selecting the “Curved line” option. Click on the “Draw” button
- Click on the download icon to save the image.
The number of publication of Yoga and Acupuncture in the last 10 years in PubMed is shown in [Figure 2]i. More lines can be added in the figure according to the number of data set.
Twenty research participants completed a stroop test, and the time was recorded in seconds. You would like to make a dot plot to graph your numerical data.
- Go to https://www.geogebra.rg/m/BxqJ4Vag
- Copy the data set and paste in the Column A. The plot will be generated
- Take a screenshot of the plot.
The dot plot is shown in [Figure 3]a. The plot also includes the mean, median, and standard deviation.
|Figure 3: Descriptive statistics visualization – (a) dot plot, (b) central tendency, (c) variance, (d) quantile-quantile plot, (e) scatter plot, and (f) Venn diagram. High-resolution figures are available in supplementary file|
Click here to view
You measured body weight of 34 employee and you wanted to get an idea about the mean age of them.
- Go to https://www.calculatorsoup.com/calculators/statistics/mean-median-mode.php
- Copy the data and paste it in the box “Enter Data Set.” Click on the “Calculate” button
- Take a screenshot of the result to save it for any future need.
The calculator shows result of mean, median, mode, minimum, maximum, quartile, and interquartile range [Figure 3]b. Hence, the central tendency of both normally distributed (commonly expressed in mean and standard deviation) and not-normally distributed data (commonly expressed as median, quartile 1– quartile 3) can be expressed from this result.
You measured body weight of 34 employee and you wanted to get an idea about the variance of the data set. Variance indicates how far the numbers are from the mean and far from each other. It is calculated by taking average of the squared differences from the mean.
- Go to https://statscalculator. com
- Clear the data already present in the “Observations” box; paste your data in the box. Click on the “Calculate” button
- Take a screenshot of the output for further usage.
[Figure 3]c shows part of the result. When the variance is calculated for a sample, the denominator in the calculation is the number of observation (n). When the variance is calculated for population, the denominator in the calculation is = n-1 [Figure 3]c.
You measured body weight of 34 employees and you wanted to get the normal Quantile-Quantile (Q-Q) plot of the distribution. Quantile is a point that divides the observations into equal groups. For example, median is a quantile that divide the sample into two equal parts; quartiles are quantiles that divide the observations into four equal groups.
- Go to http://www.wessa.net/rwasp_varia1.wasp#output
- Clear the data present in the “Data” box; copy and paste the data in the box. Click on the “Compute” button
- Click on the “New Window” to open the image in a new window. Right click on the image and use “Save Image As” option to save the image.
The Q-Q plot is shown in [Figure 3]d. The X-axis plots theoretical quantiles and Y-axis plots sample quantiles. The Q-Q plot is a graphical method to have a gross idea about the nature of distribution. When the data are normally distributed, the points are distributed along the 45° degree line. Our data did not show that pattern. Hence, it may not be normally distributed.
You measured height (cm) and weight (kg) of 19 research participants. You wanted to make a scatterplot with this bivariate (studying two variables) numerical continuous data.
- Go to https://mathcracker.com/scatter_plot
- Copy the height data and paste it in “X data (comma or sapce separted);” copy the weight data and paste it in “Y data (comma or space separated);” write the “Type the title (optional) “ as “Relationship of height and weight;” put “Height (cm)” in “Name of X variable (optional)” and “Weight (kg)” in “name of Y variable (optional).” Click on the “GRAPH IT” button
- Right click on the graph and use “Save image as” option to save the image.
The scatterplot is shown in [Figure 3]e. This graph is purely descriptive and not showing any regression line or correlation.
You collected data on the most liked five chapters in physiology from three groups of students. You would like to check and present visually about their common and unique choices in a Venn diagram.
- Go to https://bioinformatics.psb.ugent.be/webtools/Venn
- Copy the chapter names of the students' first group in the “list 1” box; “Provide name for the list (optional)” as “Group 1;” similarly do it for group 2 and group 3. Click on the “Submit” button
- Click on the “Save Image As PNG” to save the image.
The Venn diagram is shown in [Figure 3]f. The website also shows the text about the overlapping and uniqueness of items (chapters). You can either save it as screenshot or save it as text by clicking on “Save text result.”
| Results|| |
We were able to conduct common descriptive statistical tests online on public domain websites. We only included the websites that provide their service without creating an account. Hence, anyone can just open the websites and conduct the tests. We listed the brief guides to graph and summarize data using a pie chart, frequency table, stem and leaf display, histogram, frequency polygon, box plot, bar chart, stacked bar chart, line graph, dot plot, central tendency, quantile-quantile plot, scatter plot, and Venn diagram.
The output visualization can be found in [Figure 2] and [Figure 3]. The list of the websites can be found in [Table 1]. For a particular test, associated data, website link, and high-resolution image can be found in the supplementary file available in Figshare (http://dx.doi.org/10.6084/m9.figshare.16903072).
| Discussion|| |
Biomedical researchers are occasionally getting formal training on biostatistics. However, a hands-on universal training program for all researchers is still needed for creating a competent pool of physician-researcher. In India, postgraduate medical students and medical teachers are currently being trained in research methodology as a compulsory step for eligibility for university examinations or promotion. We assume it would boost the competency of future physician researchers' work capability.
In this article, we provided a quick hands-on guide to conduct various types of descriptive statistical tests. Descriptive statistics is the first step to organize, summarize, and visualize the data. The inferential tests come later. Anyone can download the fabricated example data and conduct the tests themselves following the steps to get a real-life experience. We encourage them to try the tests with their data for further experience.
The tests we described can be done online (on an internet browser) without any dedicated and installed software package. Hence, any user can conduct the tests even in public access computers that are connected to the internet. However, the websites we described here may discontinue their services at any point in time. That was the reason why we included diverse websites so that researchers can get an alternative if needed for their tests.
| Conclusion|| |
This article provided a brief technical guide on how to conduct common descriptive statistics online and free of cost. Any researcher can carry out these tests without any dedicated and costly statistical software packages. However, a computer and internet connection is the minimum requirement. The data visualization has also been described. The output visual elements can be used for presenting the data in any seminar or a manuscript. We presume that this article would help novice researchers in any resource-limited settings where institutional access to data analysis statistical software is not available.
We thank Sarika Mondal and Ahana Aarshi for their support during the preparation of the manuscript.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Guetterman TC. Basics of statistics for primary care research. Fam Med Community Health 2019;7:e000067.
Kaliyadan F, Kulkarni V. Types of variables, descriptive statistics, and sample size. Indian Dermatol Online J 2019;10:82-6.
] [Full text]
Nick TG. Descriptive statistics. Methods Mol Biol 2007;404:33-52.
Mayya SS, Monteiro AD, Ganapathy S. Types of biological variables. J Thorac Dis 2017;9:1730-3.
Kaur P, Stoltzfus J, Yellapu V. Descriptive statistics. Int J Acad Med 2018;4:60-3. [Full text]
Duquia RP, Bastos JL, Bonamigo RR, González-Chica DA, Martínez-Mesa J. Presenting data in tables and charts. An Bras Dermatol 2014;89:280-5.
Manikandan S. Frequency distribution. J Pharmacol Pharmacother 2011;2:54-6.
] [Full text]
Hazra A, Gogtay N. Biostatistics series module 1: Basics of biostatistics. Indian J Dermatol 2016;61:10-20.
] [Full text]
Shreffler J, Huecker MR. Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2021. Available from: https://www.ncbi.nlm.nih.gov/books/NBK557570/
. [Last updated on 2021 Mar 01].
He Y, Yu X, Gan Y, Zhu T, Xiong S, Peng J, et al.
Bar charts detection and analysis in biomedical literature of PubMed Central. AMIA Annu Symp Proc 2017;2017:859-65.
Streit M, Gehlenborg N. Bar charts and box plots. Nat Methods 2014;11:117.
Peebles D, Ali N. Expert interpretation of bar and line graphs: The role of graphicacy in reducing the effect of graph format. Front Psychol 2015;6:1673.
Cornelius V, Cro S, Phillips R. Advantages of visualisations to evaluate and communicate adverse event information in randomised controlled trials. Trials 2020;21:1028.
Rodrigues CFS, Lima FJC, Barbosa FT. Importance of using basic statistics adequately in clinical research. Rev Bras Anestesiol 2017;67:619-25.
Voorman A, Lumley T, McKnight B, Rice K. Behavior of QQ-plots and genomic control in studies of gene-environment interaction. PLoS One 2011;6:e19416.
Slutsky DJ. The effective use of graphs. J Wrist Surg 2014;3:67-8.
Chen H, Boutros PC. VennDiagram: A package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 2011;12:35.
Federer LM, Lu YL, Joubert DJ. Data literacy training needs of biomedical researchers. J Med Libr Assoc 2016;104:52-7.
The cultures of academic medicine in India. Natl Med J India 2019;32:308-10.2019;32:308-10.
[Figure 1], [Figure 2], [Figure 3]