All roles

Data Engineering Intern(Spring/Summer 2026)

Remote · USA Full-time New today

Description:

  • Support the development and maintenance of data pipelines using Databricks, Spark, and similar technologies.
  • Write and optimize SQL and Python scripts for data transformation, integration, and automation tasks.
  • Develop automation scripts that populate metadata and comments across Databricks tables using structured definitions such as CSV files.
  • Assist in building a proof-of-concept for an automated data dictionary maintained with existing Databricks metadata.
  • Contribute to prototyping an AI-powered knowledge agent that uses internal data and documentation to answer common questions.
  • Collaborate with team members to improve data quality, cataloging, and metadata management across the ecosystem.
  • Participate in code reviews, design discussions, and sprint ceremonies to learn engineering best practices.
  • Document findings, workflows, and automation processes for future reuse.
  • Perform other duties as assigned.

Requirements:

  • Actively pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, Information Systems, or a related technical field.
  • Foundational knowledge of Python and SQL for data manipulation and analysis.
  • Familiarity with ETL concepts and structured data formats such as CSV, JSON, and Parquet.
  • Interest in cloud-based data platforms, with Azure preferred.
  • Strong analytical and problem-solving skills with an eagerness to learn.
  • Effective communication and teamwork skills.
  • Exposure to Databricks, Apache Spark, or other distributed data frameworks is preferred.
  • Familiarity with Git or version control practices is preferred.
  • Interest in AI/LLM-based automation, data documentation, or metadata management is preferred.
  • Prior project or internship experience in data engineering or cloud technologies is preferred.

Apply tot his job Apply To this Job

Related roles